Google 



This book is provided in digital form with the permission of the rightsholder as part of a 
Google project to make the world's books discoverable online. 

The rightsholder has graciously given you the freedom to download all pages of this 
book No additional commercial or other uses have been granted. 

Please note that all copyrights remain reserved. 

About Google Books 

Google's mission is to organize the world's information and to make it universally 
accessible and useful. Google Books helps readers discover the world's books while 
helping authors and publishers reach new audiences. You can search through the full 
text of this book on the web at http:Z/books"google J corn^ 



D.Coiquhoun Lectures on 

Biostatistics 



Lectures on Biostatistics 



Copyrighted material 



D. COLQUHOUN 



Lectures on Biostatistics 

An Introduction to Statistics with 
Applications in Biology and Medicine 




CLARENDON PRESS • OXFORD • 1971 



Copyrighted material 



Oxford University Press, Ely House, London W. i 

OLAMOW NRW YORK TORONTO MELBOURNE WSLUNOTOM 
CAM TOWN IS AD AM MAJROBJ DAI ■> ftALAAM LUSAKA ADDS ABABA 
BlUCr BOMBAY CALCUTTA MADRAI KARACHI LAMORR DACCA 
KUALA LUMPUR IWOAPORB HOMO KONO TOKYO 



© OXFORD UNIVERSITY PRESS I 97 I 



PRINTED IN ORE AT BRITAIN AT THE PITMAN PRESS, BATH 



Copyrighted material 



Ct'' 



Preface 



'In statistics, just like in the industry of consumer goods, there are producers 
and consumers. The goods are statistical methods. These come in various kin da 
and 'brands' and in great and often confusing variety. For the consumer, the 
apptter of statistical methods, the choice between alternative methods is often 
difficult and too often depends on personal and irrational factors. 

The advice of producers cannot always be trusted implicitly. They are apt — 
as is only natural — to praise their own wares. The advice of consumers — based 
on experience and personal impressions — cannot be trusted either. It is well 
known among applied statisticians that in many fields of applied science, e.g. 
in Industry, experience, especially 'experience of a lifetime', compares unfavour- 
ably with objective scientific research: tradition and aversion from innovation 
are usually strong impedimenta for the introduction of new methods, even if 
these are better than the old ones. This also holds for statistics' 

(J. Hkkklruk 1981). 



Dubino the preparation of the courses for final year students, mostly 
of pharmacology, in Edinburgh and London on which this book is 
based, I have often been struck by the extent to which most textbooks, 
on the flimsiest of evidence, will dismiss the substitution of assumptions 
for real knowledge as unimportant if it happens to be mathematically 
convenient to do so. Very few books seem to be frank about, or perhaps 
even aware of, how little the experimenter actually knows about the 
distribution of errors of his observations, and about facts that are 
assumed to be known for the purposes of making statistical calculations. 
Considering that the purpose of statistics is supposed to be to help in 
the making of inferences about nature, many texts seem, to the 
experimenter, to take a surprisingly deductive approach (if assump- 
tions a, 6, and c were true then we could infer such and such). It is also 
noticeable that in the statistical literature, as opposed to elementary 
textbooks, a vast number of methods have been proposed, but remark- 
ably few have been assessed to see how they behave under the condi- 
tions (small samples, unknown distribution of errors, etc.) in which 
they are likely to be used. 

These considerations, which are discussed at greater length in the 
text, have helped to determine the content and emphasis of the methods 
in this book. Where possible, methods have been advocated that 
involve a minimum of untested assumptions. These methods, which 



Copyrighted material 



vi Preface 

occur mostly in Chapters 7-11, have the secondary advantage that 
they are much easier to understand at the present level than the 
methods, such as Student's t test and the chi-squared test (also described 
and exemplified in this book) which have, until recently, been the main 
sources of misery for students doing first courses in statistics. 

In Chapter 12 and also in § 2.7 an attempt has been made to deal 
with non-linear problems, as well as with the conventional linear ones. 
Statistics is heavily dominated by linear models of one sort and another, 
mainly for reasons of mathematical convenience. But the majority of 
physical relationships to be tested in practical science are not straight 
lines, and not linear in the wider sense described in § 12.7, and attempts 
to make them straight may lead to unforeseen hazards (see § 12.8), 
so it is unrealistic not to discuss them, even in an elementary book. 

In Chapters 13 and 14, calibration curves and assays are discussed. 
The step by step description of parallel line assays is intended to 
bridge the gap between elementary books and standard works such as 
that of Finney (1964). 

In Chapter 5 and Appendix 2 some aspects of random ('stochastic') 
prooesses are discussed. These are of rapidly increasing importance to 
the practical scientist, but the textbooks on the subject have always 
seemed to me to be among the most incomprehensible of all statistics 
books, partly, perhaps, because there are not really any elementary 
ones. Again I have tried to bridge the gap. 

The basic ideas are described in Chapters 1-4. They may be boring, 
but the ideas in them are referred to constantly in the later chapters 
when these ideas are applied to real problems, so the reader is earnestly 
advised to study them. 

There is still much disagreement about the fundamental principles 
of inference, but most statisticians, presented with the problems 
described, would arrive at answers similar to those presented here, 
even if they justified them differently, so I have felt free to choose the 
justifications that make the most sense to the experimenter. 

I have been greatly influenced by the writing of Professor Donald 
Mainland. His Elementary medical statistics (1963), which is much more 
concerned with statistical thinking than statistical arithmetic, should 
be read not only by every medical practitioner, but by everyone who 
has to interpret observations of any sort. If the influence of Professor 
Mainland's wisdom were visible in this book, despite my greater 
concern with methods, I should be very happy. 

I am very grateful to many statisticians who have patiently put up 



Copyrighted material 



Preface vii 



with my pestering over the last few years. If I may, invidiously, 
distinguish two in particular they would be Professor Mervyn Stone 
who read most of the typescript and Dr. A. G. Hawkes who helped 
particularly with stochastic processes. I have also been greatly helped 
by Professor D. R. Cox, Mr. I. D. Hill, Professor D. V. Lindley, and 
Mr. N. W. Please, as well as many others. Needless to say, none of 
these people has any responsibilities for errors of judgment or fact that I 
have doubtless persisted with, in spite of their best efforts. I am also 
very grateful to Professor C. R. Oakley for permission to quote ex- 
tensively from his paper on the purity -in-heart index in § 7.8. 

V niversity College London D . C . 

April 1970 



STATISTICAL TABLES FOR USE WITH THIS BOOK 

The appendix contains those tables referred to in the text that are 
not easily available elsewhere. Standard tables, such as normal distri- 
bution, Student's t, variance ratio, and random sampling numbers, 
are so widely available that they have not been included. Any tables 
should do. Those most referred to in the text are Fisher and Yates 
Statistical tables for biological, agricultural and medical research (6th 
edn 1963, Oliver and Boyd), and Pearson and Hartley Biometrika 
tables for statisticians (Vol. 1, 3rd edn 1966, Cambridge University 
Press). The former has more about experimental designs; the latter 
has tables of the Fisher exact text for a 2 x 2 contingency table (see § 8.2), 
but anyone doing many of these should get the full tables: Finney, 
Latscha, Bennett, and Hsu Tables for testing significance in a 2x2 
table (1963, Cambridge University Press). The Cambridge elementary 
statistical tables (Lindley and Miller, 1968, Cambridge University Press) 
give the normal, I, chi-squared, and variance ratio distributions, and 
some random sampling numbers. 



Copyrighted material 



Copyrighted material 



Contents 



INDEX OF SYMBOLS xjr 

JL 18 THE STATISTICAL WAY OF THINKING WORTH 

KNOWING ABOUT? 1 

1.1. How to avoid making a fool of yourself. The role of statistics 1 

1.2. What ia an experiment? Some baric idgtg 3 
UL Tbo nature of scien tific infarun™ fl 

2, FUNDAMENTAL OPERATIONS AND DEFINITIONS 8 

2.1. Functions and operators . A beautiful notation for adding up 9 

2.2. Probability 15 

2.3. Randomisation and random sampling lfl 

2.4. Three rules of probability 19 
2.6. Average! 24 

2.6. Measures of the variability of observations 28 

2.7. What is m wfjinHftrd flirnr? Vm-ianr^ of fnnrrtion* of t he observa- 

tions. A reference list S3 

3* THEORETICAL DISTRIBUTIONS; BINOMIAL AND POISSON 43 

3.1. The idea of a distribution 43 

3.2. Simple sampling and the derivation of the binomial distribution 

through examples 44 

3.3. Illustration of the danger of drawing conclusions from small 

samples 49 

3.4. The general expression for the binomial distribution and for its 

3.6. Random events. The Poisson distribution 63 

3.6. Some biological applications of the Poisson distribution 65 

3.7. Theoretical and observed variances: a numerical example 60 

4* THEORETICAL DISTRIBUTIONS. THE GAUSSIAN (OR 

NORMAL) AND OTHER CONTINUOUS DISTRIBUTIONS 64 

4.1. The representation of continuous distributions in general 6_4 

4.2. The Gaussian, or normal, distribution. A case of wishful thinking T 69 

4.3. Thf qtftnd*™* nnmiftl dintrihntirm 72 

4.4. The distribution of t (Student's distribution) 76 
4.6. Skew distributions and the lognormal distribution 78 
4.6. Testing for a Gaussian distribution. Rankits and problts 8Q 

6, RANDOM PROCESSES. THE EXPONENTIAL DIS - 
TRIBUTION AND THE WAITING TIME PARADOX 81 

6.1. The exponential distribution of random intervals 81 

6.2. The waiting time paradox 84 



Copyri 



x Contents 

6. CAN YOTTR RESTTT/rS RE RRMEVKn? TESTS OF 
SIGNIFICANCE AND THE ANALYSIS OF VARIANCE 8fi 

6.1. The interpretation of testa of significance &d 

fiJL Which sort of test should be used, parametric or nonparametric T 9ii 

8.3, Randomization testa 99 

6.4. Types of sample and types of measurement 99 

7. ONE SAMPLE OF OBSERVATIONS. THE CALCULA - 

TIQ N AND INTERPRETATION OJE CO N FIDENCE 

LIMITS ini 

7.1. The representative value: mean or median:' 101 

7.2. Precision r>f mfpr gnces. Can estimates of error be trusted T 1U1 

7.3. Nonparametric confidence limita for the median LQ3 

7.4. Confidence limita for the mean of a normally distributed variable 105 

7.5. Confidence limits for the ratio of two normally distributed 

observations 1112 

7.6. Another way of looking at confidence intervals I OH 

7.7. What is the probability of Success'? Confidence limits for the 

binomial probability 109 

7.8. The black magical assay of purity in heart as an example of 

binomial sampling 111 

7.9. Interpretation of confidence limits 1 14 

8. CLASSIFICATION MEASUREMENTS llfi 

8.1. Two independent samples. Relationship between various methods 1 1 

8JL Two independent samples. The randomization mothod and the 

Fisher test LLZ 

8.3. The problem of unacceptable randomizations 123 

8.4. Two independent samples. Use of the normal approximation 121 

8JL The chi-squared (y 3 ) test. Classification measurements with two 

or more independent samples 127 

8.6. One sample of observations. Testing goodness of fit with chi - 

Bquared. 132 

8.7. Related samples of classification measurements. Cross-over 

trials 134 

9. NUMERICAL AILH RAKK MEASUREMENTS. TWO 

INDEPENDENT SAMPLES 132 

fij^. Relationship between various methods 132 

1L2. Randomization test applied to numerical measurements 137 

(L2L Two sample randomization teat un ranks. The Wilcoxon (or 

Mann- Whitney) test 143 

OA* Student 'b t test for independent samples. A parametric teat 148 

10. NUMERICAL AND RANK MEASUREMENTS. TWO 
RELATED SAMPLES 152 

10.1. Relationship between various methods 152 

10.2. The sign test 153 



Copyrighted material 



Contents xi 

1QJ1. The randomisation test for paired observations 157 

10.4. The Wilcoxon signed ranks test for two related samples lfift 

l£LiL A data selection problem arising in small samples lfifi 

lQ.fl. The paired t teat lfll 

10.7. When will related samples (pairing) be an advantage? 169 

1JL. THE ANALYSIS OF VARIANCE. HOW TO DEAL 

WITH TWO OR MORE SAMPLES 111 

LL_L Relationship between various methods 171 

1 1.2. Assumptions involved in the analysis of variance based on the 

Gaussian (normal) distribution. Mathematical models for real 

observations 112 

11.3. Distribution of the VArmncw rfttirt F 179 

11.4. Gaussian analysis of variance for k independent samples (the 

one way analysis of variance). An illustration of the principle 

of the analysis of variance 182 

1 1.6 . Nonparametric analysis of variance for independent samples 

by randomization. The Kruskal-Wallis method 191 

11J1. Randomized block designs. Gaussian analysis of variance for k 

related samples (the two-way analysis of variance) lfi£ 

11.7 . Nonparametric analysis of variance for randomized blocks. 

The Friedtnan method 2QQ 

1 1.8 . The Latin square and more complex designs for experiments 20_± 

1 1.9 . Data snooping. The problem of multiple comparisons 207 

12* FITTING CURVES. THE RELATIONSHIP BETWEEN 

T WO V ARIABLES 2li 

12 ,1. The nature of the problem 214 

12.2 . The straight line. Estimates of the parameters 216 

12.3. Measurement of the error in linear regression 222 

L2L4. Confidence limits for a fitted line. The important distinction 

between the variance of y and the variance of F 224 

1_2L1L Fitting a straight line with one observation at each x value 228 

L2JL Fitting a straight line with several observations at each x 
value. The use of a linearizing transformat ion for an exponen - 
tial curve and the error of the half-life 234 

L2LI. Linearity, non-linearity, and the search for the optimum 243 

12.8 . Non-linear curve fitting and the meaning of 'best' estimate 251 

12.9 . Correlation and the problem of causality 272 



13, ASSAYS AND CALIBRATION CURVES 279 

13.1. Methods for estimating an unknown concentration or potency 279 

13.2. The theory of parallel line aesayB. The response and dose 

met&meters 287 

13.3. The theory of parallel line assays. The potency ratio 290 

13.4. The theory of parallel line assays. The beet average slope 292 



Copyrighted material 



xii Contents 



13. 5. Confidence limit* for the ratio of two normally distributed 

variables: derivation of Fieller'g theorem 293 

13.0. The theory of parallel line assays. Confidence limits for the 

potency ratio and the optimum design of assays 297 

13.7. The theory of parallel line assays. Testing for non-validity 3GQ 

13.8. The theory of symmetrical parallel line assays. Use of ortho- 

gonal contrasts to teat for non -validity 3Q2 

13.9. The theory of symmetrical parallel line assays. Use of con- 

trasts in the analysis of variance 31111 

13.10. The theory of symmetrical parallel line assays. Simplified 

calculation of the potency ratio and its confidence limits 393 

13.11. A numerical example of a symmetrical (2 + 2) dose parallel 

line assay 3_ii 

13.12. A numerical example of a symmetrical (3j-3) dose parallel 

line assay 31 9 

13.13. A numerical example of an unsymmetrical (3 + 2) dose parallel 

line assay 322. 

13.14. A numerical example of the standard curve (or calibration 

curve). Error of a value of x read off from the line 332 

13.16. The (k+ 1) dose assay and rapid routine assays 319 

14, THE INDIVIDUAL EFFECTIVE DOSE, DIRECT 
ASSAYS, ALL-OR-NOTHING RESPONSES, AND THE 
PROBIT TRANSFORMATION 344 

14.1. The individual effective dose and direct assays 344 

14.2. The re lation b etween the individ ual affec tive dose and all-or - 

nothing (quanta!) responses 346 

14.3. The problt transformation. Linearization of the quantal dose 

response curve 353 

14-4- Probit curves. Est imation of th e med iw effective dose and 

quanta! a— ay 357 

14.5. Use of the probit transformation to linearize other sorts of 

sigmoid curve 391 

14.6. Logits and other transformations. Relationship with the 

Michael is— Men ten hyperbola 361 

APPENDIX 1. Expectation, variance and non-experimental bias 395 

Al.l. Expectation — the population mean 395 

A 1.2. Varianoa 391 

A1.3. Non-experimental bias 369 

A1.4. Expectation and variance with two random 
variables. The Hum of a vari able number of 
random variables 370 

APPENDIX 2. Stochastic (or random) processes 374 
A2.1. Scope of the stochastic approach 374 
A2.2. A derivation of the Poisson distribution 375 



Copyrighted material 



Contents xiii 



A2,3. 



Thw nnnnftotinn between the lifet ime nf tndl- 

viriiiAl A^rwnAlinft molecules and the observed 

breakdow n rw.U> And hjdf-lifa nf ^ranAllnw 378 



A2.4. A stochastic view of the adsorption of molecules 

A2.fi. The relation between the lifetime of individual 
radioisotope moleculea and the interval be- 
tween disintegrations 

A2JL Why the waiting time until the next event does 
not depend on when the timing la started for 
a Poiason process 

A2^L Length-biased sampling. Why the average length 
Of the interval in Which an arbitrary moment of 
time fails Is twice the average length of all 
intervals for a Poiason process 



380 



386 



388 



389 



TABLES 



Table Al. 
Tabla A2. 

Table A3. 
Table A4. 

Tablf. A5. 



Nonparametric confidence limits for the median 396-7 

Confidence liniite for the parameter of a binomial 
distribution (i.e. the population proportion 



'successes') 



of 
398-401 



The Wilcoion test for two independent samples 402—4 

The Wilcoxon Bigned ranks test for two related 
samples 406 

The Kruakal-Wallia one-way analysis of variance 
on rankB (independent samples) 406-8 

Table A6. The Friedman two-way analysis of variance on 

ranks for randomised block experiments 409 

Table of the critical range (difference between rank 
sums for any two treatments) for comparing all 
pairs in the Kruakal-WaUis nonparametric one way 
analysis of variance 410 

Table of the critical range (difference between rank 
sums for any two treatments) for comparing all 



Tablfl A7. 



Table A8. 



pairs in the Friedman nonparametric two way 
analysis of variance 

Rankite (expected normal order statistics) 



REFERENCES 
INDEX 

GUIDE TO THE SIGNIFICANCE TESTS IN 
CHAPTERS 6-11 



ill 

ilfi 
419 

at end 



Copyrighted material 



Copyrighted material 



Index of symbols 



Reference is to the main section in which the symbol is used, explained, or defined. 

Mathematical symbols and operations 

is equal to 

is equivalent to, is defined as 
is approximately equal to 
is greater than 
is much greater than 
is equal to or greater than 

P is between a and 6 (greater than a and less than 6) 
V«i «th root of x 
l/x* 

factorial x (§2.1) 

logarithm of x. The power to which the base must be raised to 
equal x. If the base is important it is inserted (e.g. log. x, log, 0 x), 
otherwise the expression holds for any base 
= a* where a is the base of the logarithms 
add up all terms like the following (§2.1) 
multiply together all terms like the following ( § 2.1) 
logical and (§2.4) 
logical or (§2.4) 

Thompson (1965) 
or Massey and Kestelman (1004) 



Roman symbols 


a 


a constant (§2.7) 


a 


entry in a 2 x 2 table (§ 8.2) 


a 


estimate of a (value of y when x - £) (§ 12.2) 


a' 


estimate of value of y when z 0 (§ 12.2) 


a or d 


least squares value of a (§§ 12.2, 12.7) 


A 


total in 2x2 table (§8.2) 




logical 'and' (§2.4) 


b 


estimate of /?, slope of straight line ( § 12.2) 


6 or 6 


least squares estimate of /J (§§ 12.2, 12.7) 


6. 


slope for Btandard line (§ 13.4) 


K 


slope for unknown line (§ 13.4) 


b 


denominator of a ratio ( § 13.5) 


B 


total In 2 x 2 table ( § 8.2) 


e 


a constant (§2.7) 


c 


concentration (§A2.4) 


V 


population value of coefficient of variation (§2.6) 


c 


sample estimate of V (§2.6) 



> 

> 

> 

a<P<b 
x\ 

logx 



antilogoX 

n 

and 

or 

dyl&j\ 
dy\dx j 



Copyrighted material 



rvi Index of symbols 



AfM 

c 


total in 2x2 table (| 8.2) 




• A * A Mm "\ j m A » 

population covariance of z and y (J 2.6) 


cov(«,y) 


sample estimate of «*t»(a,y) 


j 

o 


difference between 2 observations (|| 10.1, 10.6) 


d 


m* a a / * i j-s . i L 

mean of d values (f 10.6) 


d 


a f mm a_ a. rt a / # « rt n k 

difference between T and y ( § 12.2) 


d 


Dunnett's (1064) statistic (§ 11.0) 


P 


A m Am M 4 . - ■ j mm a mm m gm mm ay , 

ratio of each doee to the next lowest doee (f 13.2) 


e 


| Mm A m % M+. §m m AAA / • rt SB ft 

base of natural logs, 2-71629 . . . (| 2.7) 


erp(x) 


a* 


e 


M Ml A \_ A X_ X* *l X "J AM ■ J » / f ■ ■ g~\ m 

error part of the mathematical model for an observation (§ 11.2) 


Eii E-, 


events (f 2.4) 


E(x) 


expectation (long run mean) of x ( § A 1.1 ) 


ED60 


Mam A • l ■ MB* s\ a a** W • * / c -• j r s v 

effective doee m 50 per cant of subjects (§ 14.3) 


/(«) 


MA M 9 mf M a A *1 h 

any function of * ({2.1) 


a* / t 

/(*) 


i Villi. a ii 4ft at 1* i g 1 • k 

probability density function of x (§ 4.1) 


A(*) 


• • ft • ft - am % A MM A 9 Jft V j % l 91 1 r • A rt v 

probability density function for length-biased samples ( § A2.7) 


f 


• ma ■%, f~ Aft i t M 49 rt ar 91 m A • 

number of degrees of freedom (§§ 8.6, 11.3) 


f 


Afm MM* W M) A 9 MM a t » A*M « 

frequency of observing a particular value ( $ 3.6) 


Fix's 


rilftt ri VinMon ftmft.ir»n T*r>rvHA.KiHt.iT of m A.V \ n cr an oli*u>rvAt,ion of x oi* 

UiBbnUUldUU JLiUlCult^ll. X XTJiJ n U LLi L y Ul IUM 1 tA LA V/UDCi T OlJJl^Li VI ^ W 




leas (f 4.1) 




length-biased distribution function (§ A2.7) 


F 


variance ratio ($ 11.3) 


9 


index of signiflcance of o (§ 13.5) 


9i*) 


a9l A 9 M** r * * rt t m 1 i v 

any function of a (§§ 2.1, Al.l) 


H 


KriLskal-Wallis statistic (| 11.5) 


* 

)l 


counting subscripts (§2.1) 


k 


. a_ _% « ■ mm tM j~\ m* #*1 A*A A 

number of classes (}} 6.6, 8.6) 


# 

k 


M MM M A M / y--,-* AM AM m 

number of groups, treatments (Chap. 11) 


k 


• MM V § AM Mt MAM A^MA A 

number of z values (§12.6) 


k 


rate constant (§§ 12.2, 12.6, A2.3) 


*Si *o 


a r i a % mm a a a a /aaa a a a o n \ 

number of dose levels for standard, unknown (§§ 13.1, 13.8) 


• 

k 


V » J a MM m ATM. A m aIma a*\. • 

k B +k v (.§ 13.1, 13.8) 


X 


parameter of hyperbola (e.g. Michaelis constant) (§ 12.8) 


mm 

K 


• AM M fM tMM% . AM m i i . 

a sample estimate of X (§ 12.8) 


K 


a a e A A* A AM mmAWP* A SB) *M •*» r \ l 

the least squares estimate of JT (§ 12.8) 


L 


• • » f M . > ft t at Am. mm %, 

linear function of observations ( § 2.7) 


L 


M m m a a s M m « n • 

orthogonal contrasts (§ 13.8) 


LD60 


i ■ i i i ft m* «% j Mm a v j / o a} a ma- % 

lethal dose in 60 per cent of subjects (§ 14.3) 


m 


an estimate of the mean, ft (§ 2.5) 


m 


m am « AM A 9 9 A A 1 / _ _ _ 1 A V a^_ 

population mean number of events in unit time (space, etc.) for 




Poisaon ( §§ 3.5, 6.1, A2.2) 


tn 


sample estimate of m (§ 3.6) 


«M 


a j a a V /A Mm am M 

population median (| 7.3) 


fit 


number of new observations ($j «•*» 


m 


estimate of a ratio, a/6 ( § 7.5) 


If 


log r (potency ratio) (9 13.3) 


n, JV 


number of observations, an integer (§2.1) 


n 


number of binomial 'trials' (§ 3.2) 


n(i) 


number of intervals ^ t (§ 6.1) 


»* 


number of observations ln^'th group (§11.4) 



Copyrighted material 



Index of symbols xvii 

n number of replicate responses to each dose (f f 13.2, 13.8) 

n B , n D number of responses for standard, unknown ( j§ 13.2, 13.8) 

-V^ total number of adsorption sites ( § A2.4) 

\it) number of sites occupied, or nuclei not disintegrated, at time t 

(|$A2.4, A2.6) 
NED normal equivalent deviation (§ 14.8) 

o( A*) any quantity that becomes negligible for very short time-intervals 

($A2.2) 

or logical 'or' ($2.4) 

v observed proportions ( §§ 3.6, 8.4, 14.4, 14.6) 

v same as F[x) (f 4.1) 

p cumulative frequency (f 14.2) 

P[.) probability (true or estimated) of the event in brackets ( §§ 2.2, 2.4) 

P[ h\ | B a J conditional probability of E t given that E a has happened ( } 2.4) 
P result of significance test (§6.1), confidence level (§§ 7.2, 7.9) 

& probability of a 'success' at each trial in binomial situation 

(§53.2, 3.6) 

j» H , j> L high and low confidence limits for 9 ($ 7.7) 
A. f*„ probability that a site is occupied, empty (§ A2.4) 

number of 'successes* <S§ 3.1, 3.2), of events (§3.6) 

rank of observation forming confidence limits for median ( $ 7.3) 
number of rows, (e.g. treatments) (§§ 8.6, 8.6) 
mean observed r, estimate of m ( $ 8.6) 
Pearson correlation coefficient (f 12.9) 
r a Spearman rank correlation coefficient § 12.9) 

r base of logarithms for assays ($$ 13.2, 13.3) 

A' sum of ranks (ff 9.3, 11.3, 11.6) 

R potency ratio (|f 13.1, 13.3) 

t(.) sample standard deviation of the variable in brackets. An estimate 

of <r(.) (ff 2.1, 2.6, 2.7) 
t a (.) sample variance of variable in brackets — var(.). Square of «(.). 

An estimate of o^.) b M «« (.) (f f 2.1, 2.6, 2.7) 
i'bm, •"mm largest and smallest of a set of sample variances ( § 1 1.2) 
8 Friedman statistic ( $ 1 1.7 ) 

8 Scheffe function (} 11.9) 

8 sum of squared deviations to be minimized ( § 12.2) 

8 standard preparation ($13.1) 

t Student's statistic ($ 4.4) 

t time, time interval between events, lifetime (Chapter 6, Appendix 2) 

I time considered as a random variable, t denoting a particular value 

off ($A2.6) 
At a time interval ($$6.1, A2.2) 

Tt.t etc. total of observations ( $ 2. 1 ) 

&~ population mean interval between events = X~ l (§§5.1, Al.l. 

A2.2) 

T sample estimate of T ($5.1) 

T sum of positive ranks, or of negative ranks, whichever is the 

smaller ($ 10.4) 
« standard normal (Gaussian) deviate ($ 4.3) 

U unknown preparation ($13.1) 

t»u, etc. variance multipliers (§ 13.6) 
■ 



Copyrighted material 



xviii Index of symbols 

vat(.) population variance of variable in brackets. Same as a%). 
(§§2.6, 2.7) 

var(.) Sample estimate of va t(.). Same as «?(.) (§§ 2.6, 2.7) 

"V population (true) maximum value of y ( § 12.8) 

V a sample estimate of (§ 12.8) 

f least squares estimate of "K (§ 12.8) 

to weight (§2.5) 

x any variable (§§ 2.1, 2.7, 4.1) 

£ x considered as a random variable, x denoting a particular value of 

& (§§4.1, Al.l) 

£ geometric mean of x values (§2.5) 

x independent variable in curve fitting problems (§§12.1, 12.2) 

x log z (Chapter 13) 

x 0 , x m observed frequency (integer) and expected frequency (§8.5) 

£<,, etc. means of observations (§2.1) 

y observed value of dependent variable in curve-fitting problems 
(§§ 12.1, 12.2) 

Y value of dependent variable read off fitted curve (§ 12.2) 
* g , *u doses of standard and unknown (§13.1) 

z' doses giving equal responses ( § 13.3) 

Greek symbols 

a (alpha) probability of an error of the first kind ( § 6.1) 

oc population value of y when * = estimated by a (§§ 12.2, 12.7) 

a population value of numerator of ratio ( § 13.5) 

a orthogonal coefficient (§ 13.8) 

{} (beta) probability of an error of the second kind ( § 6.1 ) 

0 population value of slope, estimated by 6 (§§ 12.2, 12.7) 

Pi block effect for tth block in model observation (§ 1 1.2) 

(} population value for denominator of ratio ( § 13.5) 

A (delta) change in, interval in, value of following variable (§§5.1, 

A2.2) 

A (lambda) population (true) mean number of events in unit time 

(§§5.1, A2.2) 

A measure of probability of cata holism, disintegration, adsorption in 

a short time-interval (§§ A2.3-A2.6) 
p (mu) population mean (§§ 2.5, 4.2, 12.2, Al.l) 

ft population (true) value of ratio (§ 13.5) 

/4 measure of probability of desorption (§ A2.4) 

3-141593... 

n (capital pi) multiply the following § 2.1 

a 2 ^) (sigma) population (true) variance of variable in brackets. Same as 

vat(.) (§§2.1, 2.6, 2.7, A1.2) 
£ (sigma) add up the following (§2.1) 

T f (tau) treatment effect for jth treatment in model observation 

(§11.2) 

X a cr) (chi) chi-squared statistic with / degrees of freedom (§ 8.5) 

z%»nk rank statistic distributed approximately as jj 3 (§ 11.7) 

<u a (omega) interblock standard deviation ( § 1 1.2) 



Copyrighted material 



1. Is the statistical way of thinking 
worth bothering about? 



*I wish to propose for the reader's favourable consideration a doctrine which 
may, I fear, appear wildly paradoxical and subversive. The doctrine in question 
is this: that it is undesirable to believe a proposition when there is no ground 
whatever for supposing it true. I must of course, admit that if such an opinion 
became common it would completely transform our social life and our political 
system: since both are at present faultless, this must weigh against it. I am also 
aware (what is more serious) that it would tend to diminish the incomes of 
clairvoyants, bookmakers, bishops and others who live on the irrational hopes of 
those who have done nothing to deserve good fortune here or hereafter. In spite 
of these grave arguments, I maintain that a case can be made out for my paradox, 
and I shall try to set it forth.' 

BEBTBAND RU88ELL, 1935 

(On the Value of Scepticism) 



1 .1 . How to avoid making a fool of yourself. The role of statistics 

It is widely held by non-statisticians, like the author, that if you do 
good experiments statistics are not necessary. They are quite right. 
At least they are right as long as one makes an exception of the import- 
ant branch of statistics that deals with processes that are inherently 
statistical in nature, so-called 'stochastic' processes {see Chapters 3 
and 5 and Appendix 2). The snag, of course, is that doing good experi- 
ments is difficult. Most people need all the help they can get to prevent 
them making fools of themselves by claiming that their favourite 
theory is substantiated by observations that do nothing of the sort. 
And the main function of that section of statistics that deals with 
tests of significance is to prevent people making fools of themselves. 
From this point of view, the function of significance tests is to prevent 
people publishing experiments, not to encourage them. Ideally, indeed, 
significance tests should never appear in print, having been used, if at 
all, in the preliminary stages to detect inadequate experiments, so that 
the final experiments are so clear that no justification is needed. 

The main aim of this book is to produce a critical way of thinking 
about experimentation. This is particularly necessary when attempting 



Copyrighted material 



2 Statistical thinking? 



§ 1-1 



to measure abstract quantities such as pain, intelligence, or purity in 
heart (§ 7.8). As Mainland (1964) points out, most of us find arithmetic 
easier than thinking. A particular effort- has therefore been made to 
explain the rational basis of as many methods as possible. This has 
been made much easier by starting with the randomization approach 
to significance testing (Chapters 6-11), beoause this approach is easy to 
understand, before going on to tests like Student's t test. The numerical 
examples have been made as self-contained as possible for the benefit 
of those who are not interested in the rational basis. 

Although it is difficult to achieve these aims without a certain 
amount of arithmetic, all the mathematical ideas needed will have 
been learned by the age of 15. The only difficulty may be the occa- 
sional use of longer formulae than the reader may have encountered 
previously, but for the vast majority of what follows you do not need 
to be able to do anything but add up and multiply. Adding up is so 
frequent that a special notation for it is described in detail in § 2.1. 
You may find this very dull and boring until familiarity has revealed its 
beauty and power, but do not on any account miss out this section. 
In a few sections some elementary calculus is used, though anything at 
all daunting has been confined to the appendices. These parts can be 
omitted without affecting understanding of most of the book. If you 
know no calculus at all, and there are far more important reasons for 
no biologist being in this position than the ability to understand the 
method of least squares, try Silvanus P. Thompson's Calculus made Easy 
(1965). 

A list of the uses and scope of statistical methods in laboratory and 
clinical experimentation is necessarily arbitrary and personal. Here is 
mine. 

(1) Statistical prudence (Lancelot Hogben's phrase) encourages the 
design of experiments in a way that allows conclusions to be drawn from 
them. Some of the ideas, such as the central importance of randomiza- 
tion (see §§ 2.3, 6.3, and Chapters 8-11) are far from intuitively obvious 
to most people at first. 

(2) Some processes are inherently probabilistic in nature. There is no 
alternative to a statistical approach in these oases (see Chapter 5 and 
Appendix 2). 

(3) Statistical methods allow an estimate (usually optimistic, see 
§ 7.2) of the uncertainty of the conclusions drawn from inexact observa- 
tions. When results are assessed by hopeful intuition it is not uncommon 
for more to be inferred from them than they really imply. For example, 



Copyrighted material 



11.1 



Statistical thinking? 3 



Schor and Karton (1966) found that, in no less than 72 per cent of a 
sample of 149 articles selected from 10 highly regarded medical journals.- 
conclusions were drawn that were not justified by the results presented. 
The most common single error was to make a general inference from 
results that could quite easily have arisen by chance. 

(4) Statistical methods can only cope with random errors and in 
real experiments systematic errors (bias) may be quite as important as 
random ones. No amount of statistics will reveal whether the pipette 
used throughout an experiment was wrongly calibrated. Tippett (1944) 
put it thus: 4 I prefer to regard a set of experimental results as a biased 
sample from a population, the extent of the bias varying from one kind 
of experiment and method of observation to another, from one experi- 
menter to another, and, for any one experimenter, from time to time.' 
It is for this reason, and because the assumptions made in statistical 
analysis are not likely to be exactly true, that Mainland (1964) em- 
phasizes that the great value of statistical analysis, and in particular 
of the confidence limits discussed in Chapter 7, is that 'they provide 
a kind of minimum estimate of error, because they show how little a 
particular sample would tell us about its population, even if it were a 
strictly random sample.' 

(5) Even if the observations were unbiased, the method of calculating 
the results from them may introduce bias, as discussed in §§2.6 and 
12.8 and Appendix 1. For example, some of the methods used by 
biochemists to calculate the Michaelis constant from observations of the 
initial velocity of enzymio reactions give a biased result even from 
unbiased observations (see § 12.8). This is essentially a statistical 
phenomenon. It would not happen if the observations were exaot. 

(6) The important point to realize is that by their nature statistical 
methods can never prove anything. The answer always comes out as 
a probability. And exactly the same applies to the assessment of 
results by intuition, except that the probability is not calculated but 

1.2. What is an experiment? Some basic ideas 

Statistics originally meant state records (births, deaths, etc.) and its 
popular meaning is still much the same. However, as is often the case, 
the scientific meaning of the word is much narrower. It may be illus- 
trated by an example. 

Imagine a solution <x>ntaining an unknown concentration of a drug. 
If the solution is assayed many times, the resulting estimate of 



Copyrighted material 



4 Statistical thinking ? 



§ 1.2 



concentration will, in general, be different at every attempt. An 
unknown true value such as the unknown true concentration of the 
drug is called a parameter. The mean value from all the assays gives an 
estimate of this parameter. An approximate experimental estimate 
(the mean in this example) of a parameter is called a statistic It is 
calculated from a sample of observations from the population of all 
possible observations. 

In the example just discussed the individual assay results differed 
from the parameter value only because of experimental error. However, 
there is another slightly different situation, one that is particularly 
common in the biological sciences. For example, if identical doses of 
a drug are given to a series of people and in each case the fall in blood 
sugar level is measured then, as before, each observation will be differ- 
ent. But in this case it is likely that most of the difference is real. 
Different individuals really do have different falls in blood sugar level, 
and the scatter of the results will result largely from this fact and only 
to a minor extent from experimental errors in the determination of the 
blood sugar level. The average fall of blood sugar level may still be of 
interest if, for example, it is wished to compare the effects of two 
different hypoglycemic drugs. But in this case, unlike, the first, the 
parameter of which this average is an estimate, the true fall in blood 
sugar level, is no longer a physical reality, whereas the true concentra- 
tion was. Nevertheless, it is still perfectly all right to use this average as 
an estimate of a parameter (the value that the mean fall in blood 
sugar level would approach if the sample size were increased indefin- 
itely) that is used simply to define the distribution (see §§3.1 and 
4.1) of the observations. Whereas in the first case the average of all 
the assays was the only thing of interest, the individual values being 
unimportant, in the second case it is the individual values that are of 
importance, and the average of these values is only of interest in so far 
as it can be used, in conjunction with their scatter, to make predictions 
about individuals. 

In short, there are two problems, the older one of estimating a true 
value by imperfect methods, and the now common problem of measur- 
ing effects that are really variable (e.g. in different people) by relatively 
very accurate methods. Both these problems can be treated by the 
same statistical methods, but the interpretation of the results may be 
different for each. 

With few exceptions, scientific methods were applied in medicine and 
biology only in the nineteenth century and in education and the social 

j 

Copyrighted material 



§ 1.2 



Statistical thinking? 6 



sciences only very recently. It is necessary to distinguish two sorts of 
scientific method often called the observational method and the 
experimental method. Claude Bernard wrote : 'we give the name observer 
to the man who applies methods of investigation, whether simple or 
complex, to the study of phenomena which he does not vary and which 
he therefore gathers as nature offers them. We give the name experi- 
menter to the man who applies methods of investigation, whether 
simple or complex, so as to make natural phenomena vary.' In more 
modern terms Mainland (1964) writes: 'the distinctive feature of an 
experiment, in the strict sense, is that the investigator, wishing to 
compare the effects of two or more factors (independent variables) 
assigns them himself to the individuals (e.g. human beings, animals or 
batches of a chemical substance) that comprise his test material.' 
For example, the type and dose of a drug, or the temperature of an 
enzyme system, are independent variables. 

The observational method, or survey method as Mainland calls it, 
usually leads to a correlation; for example, a correlation between 
smoking habits and death from lung cancer, or between educational 
attainment and type of school. But the correlation, however perfect 
it may be, does not give any information at all about causation^ such 
as whether smoking causes lung cancer. The method lends itself only 
too easily to the confusion of sequence with consequence. 'It is the 
post hoc, ergo propter hoc of the doctors, into which we may very easily 
let ourselves be led' (Claude Bernard). 

This very important distinction is discussed further in §§12.7 and 
12.9. Probably the most useful precaution against the wrong interpreta- 
tion of correlations is to imagine the experiment that might in principle 
be carried out to decide the issue. It can then be seen that bias in the 
results is controlled by the randomization process inherent in experi- 
ments. If all that is known is that pupils from type A schools do better 
than those from type B schools it could well have nothing to do with 
the type of school but merely, for example, be that children of educated 
parents go to type A school and those of uneducated parents to type B 
schools. If proper experimental methods were applied in the situations 
mentioned above the first step would be to divide the population (or a 
random sample from it) by a random process, into two groups. One 
group would be instructed to smoke (or to go to a particular sort of 
school), the other group would be instructed not to smoke (or go to 

f It is not even necessarily true that zero correlation rules out causation, because 
lack of correlation does not necessarily imply independence (see § 12.9). 



Copyrighted material 



6 Statistical thinking f 



a different sort of school). The difficulty in the medical and social 
sciences is usually that an experiment may be considered unethical. 
Since it can hardly be assumed a priori that there is an equal chance of 
smoking haying good or bad effects on health, it is not possible to 
instruct a group of people to smoke, though it should be perfectly 
acceptable to leave one randomly selected group to its normal habits 
(including smoking by some of the group) and to instruct the other 
group to stop smoking. 

Often the situation is not as bad as this, however. There is genuine 
doubt about the relative merits of different sorts of school, and, very 
often, about different sorts of therapy, so in these cases it is not merely 
ethical to do a proper experiment, but it would be unethical, though 
not unusual, not to do the experiment. 

1.3. The nature of scientific inference 

We are concerned with the establishment of new knowledge about the 
real world. Therefore it will do no harm to mention something of the 
logical foundations of inference before starting on methods. 

The earliest natural philosophers based their work largely on 
deductive arguments from axioms. The only criterion for a valid set of 
axioms is that it should not be possible to deduce contradictory con- 
clusions from them, i.e. that the axioms should be consistent. Even if 
this is so it has no bearing on whether or not the axioms are true. 

Later it came to be supposed that knowledge of the natural world 
could only be obtained by induction of general theories from particular 
observations, and not, as had previously been assumed, by deduction of 
the particular case from a general axiom. 

The process of induction must clearly be subject to uncertainties, 
but it was not until much later that these uncertainties were investigated 
and attempts made to measure them. During the seventeenth century 
the study of probability theory was started by Format and Pascal. This 
was, and still is, a branch of mathematics, wholly deductive in nature. 

Probability theory and experimental method grew up alongside 
each other, but largely separately. One of the first attempts at a 
synthesis came when astronomers wanted to find out whether the stars 
were distributed randomly, or in some sort of order. What was needed 
was a method for determining the probability of a hypothesis being true 
given some experimental observations relevant to it. For example: 
(1) the hypothesis that the stars are randomly distributed; (2) the 
hypothesis that morphine ia a better analgesic than aspirin; or (3) 



Copyrighted material 



Statistical thinking t 7 



the hypothesis that state schools provide a better education than 
private schools. 

The use to which natural scientists wanted to put probability theory 
was, it seems, of a quite different kind from that for which the theory 
was designed. All that probability theory would answer were questions 
such as : Given certain premises about the thorough shuffling of the pack 
and the honesty of the players, what is the probability of drawing four 
consecutive aces ? This is a statement of the probability of making 
some observations, given an hypothesis (that the cards are well shuffled, 
and the players honest), a deductive statement of direct probability. 
What was needed was a statement of the probability of the hypothesis, 
given some observations — an inductive statement of inverse probability. 

An answer to the problem was provided by the Rev. Thomas Bayes 
in his Essay towards solving a Problem in the Doctrine of Chances published 
in 1763, two years after his death. Bayes' theorem states: 

posterior probability of a hypothesis = constant x likelihood 

of hypothesis x prior probability of the hypothesis (1.3.1.) 

In this equation prior (or a priori) probability means the probability 
of the hypothesis being true before making the observations under 
consideration, the posterior (or a posteriori) probability is the probability 
after making the observations, and the likelihood of the hypothesis is 
denned as the probability of making the given observations if the 
hypothesis under consideration were in fact true. This technical 
definition of likelihood will be encountered again later. 

The wrangle about the interpretation of Bayes' theorem continues 
to the present day. Is 'the probability of an hypothesis being true* a 
meaningful idea? The great mathematician Laplace assumed that if 
nothing were known of the merits of rival hypotheses then their prior 
probabilities should be considered equal ('the equipartition of ignor- 
ance'). Later it was suggested that Bayes' theorem was not really 
applicable except in a small proportion of cases in which valid prior 
probabilities were known. This view is still probably the most common, 
but there is now a strong school of thought that believes the only 
sound method of inference is Bayesian. An uncontroversial use of 
Bayes' theorem, in medical diagnosis, is mentioned in § 2.4. 

Fortunately, in most, though not all, cases the practical results are 
the same whatever viewpoint is adopted. If the prior probabilities of 
several mutually exclusive hypotheses are known or assumed to be 
equal then the hypothesis with the maximum posterior probability 



Copyrighted material 



8 Statistical thinking ? 



§ 1.3 



will also be that with the maximum likelihood. In fact a popular 
procedure is to ignore the prior probability altogether and to select the 
hypothesis with the maximum likelihood. This procedure avoids 
altogether the making of statements of inverse probability that many 
people think to be invalid, but loses something in interpretability. 
The probability considered is the probability of the observations calcu- 
lated assuming the hypothesis in question to be true — a statement of 
direct probability. 

It has been argued strongly by Karl Popper that scientific inference 
is a wholly deductive process. A hypothesis is framed by inspired guess- 
work. It consequences are deduced and then tested experimentally. This 
is certainly just how things should be done. But, as A. J. Ayer points 
out, the experiment is only useful if it is supposed that it will give the 
same result when it is repeated, and the argument leading to this 
supposition is the sort of inductive inference with which much of 
statistics is concerned. 



Copyrighted material 



2 . Fundamental operations and 
definitions 



'Considering how many fools can calculate, it is surprising that it should be 
thought either a difficult or a tedious task for any other fool to learn how to 
master the same tricks. Silvanus P. Thompson 



2.1. Functions and operators. A beautiful notation for adding up 

Functional notation 

If the value of one variable, say y, depends on the value of another, 
say x, then y is said to be a function of x. For example, the response to a 
drug is a function of the dose. The usual algebraic way of saying this is 
y = f(x) where / denotes the function. This equation is read 'y equals a 
function of x\ If it is required to distinguish different functions of the 
same variable then different symbols are chosen to represent each 
function. For example, y t = g(x), y 2 — <f>{x). If the function / were the 
square root, g were the logarithm, and <f> the tangent then the above 
equations could be written in a less abstract form as y = y/x, y x = log 
x, and y 2 — tan x. This notation can be extended to several variables. 
If the value of y depends on the value of two different variables, x x 
and x 2 say, this could be denoted y = f{x lt x 2 ). An example of such a 
function is y = x l 2 -\-x 2 . 

Needless to say, the symbols, /, g, and <f> do not stand for numbers 
and, for example, it is very important to distinguish y — f(x) from 
'y equals / times x\ In the present case /, g, and <f> stand for operations 
carried out on the argument x in just the same way as the symbol 
' -f- ' stands for the operation of addition of the quantities on each side 
of the plus sign, or the symbol djdx stands for 'find the differential 
coefficient with respect to x\ 

In the following pages this operational notation is used frequently. 
For example, s{x) will stand for 'the estimated standard deviation of 
x' (not *« times x'J.f The square of the standard deviation is called the 

f 8ee § 2.6 for the definitions. Although it is commonly used, this is not really a 
consistent use of the notation. The sample standard deviation, *(x), is not a function of 
a tingle variable x, but of the whole set of x values making up the sample. And in the 
case of the population standard deviation, a(x), a is really an operator on the probability 
distribution of x (see Appendix 1). 



Copyrighted material 



10 Operations and definitions 



variance. The variance of x is thus [«(x)] a , which is usually written 
^(x). The situation may look even more confusing if it is wished to 
denote the estimated standard deviation of a quantity like x x —x 2 , i.e. 
a measure of the scatter of the values of the quantity x x —x 2 . Using the 
notation given above this number would be written s{x x — x 2 ), but this 
is not the same as s[x l )—a{x 2 ); s is an operator not a number. To add 
to the difficulties it is quite common for s(x) or f(x) to be abbreviated 
to s and /, the argument, x, being understood. So in this case a and / 
do stand for numbers; the numbers s{x) and f{x). Brackets rather than 
parentheses are sometimes used to make the notation clearer so the 
standard deviation ofx 1 —x 2 is written s[x x — xj. 

Two important operators are those used to denote the formation of 
sums and products, viz. S and II (Greek capitals, sigma and pi). For 
example, Sx means find the sum of all the values of x, and Ilx means 
find the product of all the values of x. These operations occur often 
and are disoussed in more detail below. 

Factorial notation 

Another operation that will ooour in the following pages is written 
n!, which is read as 'factorial n\ When n is an integer this has the 
value n(n— l)(n— 2)...l. For example, 41 = 4x3x2x1 = 24. A 
more general definition (the gamma function) is valid also for non- 
integers and occurs often in more advanced work than is dealt with 
here. In the light of this more general definition, as well as for reasons of 
convenience that will be apparent later, 0 1 (factorial zero) is defined as 
having the value 1. 

The use of the summation operator 

The operation of adding up occurs very often. The arithmetic is 
familiar, but the notation used may not be. In the following pages the 
summation operator is used often. Frequently it is written with a full 
panoply of superscript and subscripts. This makes the operation 
unambiguous, at the expense of looking a bit complicated. It is very 
well worth while (for far wider reasons than merely understanding this 
book) making sure that you can add up, so the temptation to skip this 
section should be resisted. The use of the product operator n is ana- 
logous, 4- being replaced by x . 

Given a set of observations, for example n replicate observations on 
the same animal of the fall in blood pressure in response to a drug, an 
observation can be denoted y t . This symbol stands for the tth fall in 



Copyrighted material 



§2.1 Operations and definitions 11 

blood pressure . There are n observations so in general an observation is 

y, where % — 1,2 n. If n = 5, for example, then the five observations 

are symbolized ft, y a , y a , y 4 , y a . Note that the subscript i has not neces- 
sarily got any particular experimental significance. It is merely a method 
for counting, or labelling, the individual observations. 
The observations can be laid out in a table thus: 

Mathematicians would refer to suoh a table as a one-dimensional array 
or a vector, but 'table' is a good enough name for now. 

The instruction to add up all values of y from the first value to the 
nth value is written 

(-H « 

sum = Jy, or, more briefly, Jy,. 
i-i i-i 

This expression symbolizes the number 

sum — y x -f-y a +y 3 +...+y». 

Similarly, 

2y, stands for the number y 3 +y*-f-y5« 

(-3 

Thus the arithmetic mean of n values of y is 

t-a 

n 

Notice that after the summation operation, the counting, or sub- 
scripted, variable, «', does not appear in the result. 

A slightly more complicated situation arises when the observations 
can be classified in two ways. For example, if n readings of blood 
pressure (y) were taken on each of k animals the results would probably 
be arranged in a table like this : 



observation 
(value oft) 





Animal (value of j) 
12 3 .... k 


1 


Vn 


V12 


Via "•• tfm 


2 


V21 


1/22 


J/3.-, .... V2k 


3 

* 


Vy. 


« 


V'M ifo* 

1 


n 


Vnl 


y»a 


Vna Vnk 



Copyrighted material 



12 Operations and definitions 



§ 2.1 



Two subscripts, say i and j, are now needed, one for keeping count of 
the observations and one for the animals. i takes the values 1, 2, 3,..., 
n; and j takes the values 1, 2, 3,..., k. The ith observation on the jth 
animal is thus represented by the symbol y if . In more general terms, 
y tj stands for the value of y in the ith row and jth column of a table 
(or ttoo-dimensional array, or matrix) such as that shown above. 
For example, a table with 3 columns and 4 rows could be written 

3/n 2/i2 2/i3 2 4 3 

&a 2/23 an d a particular table 16 4 

y3i y^2 Vzz °f tms 8 i ze could be 5 7 6 

Vii y*a !/4 3 8 4 5 

In this case n — 4 and k = 3 and the n x k table contains nk = 12 obser- 
vations. 

The row and column totals and means can be represented by an 
extension of the notation used above. For example the total of the 
observations in the jth column, which may be called T f for short, 
would be written 

Tj = S/t/ = yi/+2/ 2 /+!/3/+" -+i/*/. (2.1.1) 
i-i 

Thus the total of the first column is T A = yii+y2i+y3i+--+2/ni 
(=16 in the example). The mean of the readings in the jth column (the 
mean fall in blood pressure in the jth animal in the example given 
above), which is usually called y { , is thus 



< = n 

y. 



(2.1.2) 

n n 



Again notice that after summing over the values of i (i.e. adding 
up the numbers in a specified column) the answer does not involve t', 
but does sti 11 involve the specified column number j. The symbol », the 
subscript operated on, is replaced by a dot in the symbols T mJ and y t . 

In an ex actly similar way the total for the ith row, T Ut is written 

T i. = 2 ytj = i/u+y<a+yi3+ (2.1.3) 



Copyrighted material 



§2.1 



Operations and definitions 13 



For example, for the second row T 2 = y2i + yaa+-"+ya* (=11 m 
the example). The mean value for the tth row is 



fat 

y,< k k 



(2.1.4) 



Using the numbers in the 4x3 table above, the totals and means 
are found to be 



number 
(value 

Off) 



1 
2 
3 
4 




Vii - 


2 


Via- 


4 


yi 9 


yai = 


l 




6 


y a3 


Vai = 


6 


y» a = 


7 


y a3 


y« - 


8 


y«s = 


4 


y<3 



Column means 



( - n ( - n 1 = n 

J\j = 2y a T 3 = Ej,, a T 3 - £y 13 

(-1 1-1 <- 1 

= 16 =21 =18 

y.i -= y a = *"/« y. 3 = 



Row total* 



im 
fm 



y«= H y 3 .= "/a 
y 9 i- »8 fc.- i»/3 

17/3 



Grand total 



Tta grand total of all the observations in the table illustrates the 
meaning of a double summation sign. The grand total {Q, sayf) could 
be written as the sum of the row totals 

i = * 

0 = 1 (?Y). 
Inserting the definition of T t from (2.1.3) gives 

i-n /i"k 



o = III*,). 



Equally, the grand total could be written as the sum of the column 
totals 



t It would be more consistent with earlier notation to replace both suffixes by dota 
and call the grand total T.„ but the symbol 0 is often used instead. 



Copyrighted material 



14 Operations and definitions § 2.1 

wliich, inserting the definition of T } from (2.1.1), becomes 

Since the grand total is the same whiohever order the additions are 
carried out in, the parentheses are superfluous and the operation is 
usually symbolized 

i«n i-k f-fc i-a 

° = I I Vv or 2 2 tfu or eimply 22> 

WAa/ to (fo i/ yott 0e< sfwcfc 

If it is ever unclear how to manipulate the summation operator 
simply write out the sum term by term and apply the ordinary rules of 
algebra. For example, if k denotes a constant then 

£fcr, - Jfc£s, (2.1.5) 
<-i <- i 

because the left-hand side, written out in full, is to 1 +jbr a + 
= *(s 1 +x a +...+:r n ), which is the right-hand side. Thus if A is the same 
for every x it can be 'taken outside the summation sign'. However 
EfcjX,, in which each x is multiplied by a different constant, is h x x x 

which cannot be further simplified. 
It follows from what has been said that if the quantities to be added 
do not contain the subscript then the summation becomes a simple 
multiplication. If all the x { = 1 in (2.1.5) then 

* = *+*+...+* = nk (2.1.6) 

and furthermore, if k = 1, 



Another useful result is 

n 

2 = (*i-yi)+{x 2 -y 2 ) + ... + {x n -y n ) 

1-1 

= (*i+M-"-+*«)-(yi+y 2 -h..+y„) 

= 2><-£yi. (2.i.8) 
i-i ) - 1 

These results will be used often in later sections. 



l-n 

2 1 = n. 



(2.1.7) 



Copyrighted material 



§2.2 



Operations and definitions 15 



2.2. Probability 

The only rigorous definition of probability ia a set of axioms denning 
its properties, but the following discussion will be limited to the less 
rigorous level that is usual among experimenters. For practical pur- 
poses the probability of an event is a number between zero (implying 
impossibility) and one (implying certainty). Although statisticians 
differ in the way they define and interpret probability, there is complete 
agreement about the rules of probability described in § 2.4. In most of 
this book probability will be interpreted as a proportion or relative 
frequency. An excellent discussion of the subject can be found in 
Lindley (1965, Chapter 1). 

The simplest way of denning probability is as a proportion, viz. 
•the ratio of the number of favourable cases to the total number of 
equiprobable cases'. This may be thought unsatisfactory because the 
concept to be denned is introduced as part of its own definition by the 
word 'equiprobable', though a non-numerical ordering of likeliness 
more primitive than probability would be sufficient to define 'equally 
likely', and hence 'random'. Nevertheless when the reference set of 
'the total number of equiprobable cases' is finite this description is 
used and accepted in practice. For example if 55 per cent of the popula- 
tion of college students were male it would be asserted that the prob- 
ability of a single individual chosen from this finite population being 
male is 0«66, provided that the probability of being ohosen was the 
same for all individuals, i.e. provided that the choice was made at 
random. 

When the reference population is infinite the ratio just discussed 
cannot be used. In this case the frequency definition of probability is 
more useful. This identifies the probability P of an event as the limiting 
value of the relative frequency of the event in a random sequence of 
trials when the number of trials becomes very large (tends towards 
infinity). For example, if an unbiased coin is tossed ten times it would 
not be expected that there would be exactly five heads. If it were tossed 
100 times the proportion of heads would be expected to be rather 
closer to 0-6 and as the number of tosses was extended indefinitely the 
proportion of heads would be expected to converge on exactly 0-5. 
This type of definition seems reasonable, and is often invoked in 
practice, but again it is by no means satisfactory as a complete, 
objective definition. A random sequence cannot be proved to converge 
in the mathematical sense (and in fact any outcome of tossing a true 
s 



Copyrighted material 



16 Operations and definitions 



coin a million times is possible), but it can be shown to converge in a 
statistical sense. 

Degrees of belief 

It can be argued persuasively (e.g. Lindley (1965, p. 29)) that it is 
valid and sometimes necessary to use a subjective definition of prob- 
ability as a numerical measure of one's degree of belief or strength of 
conviction in a hypothesis ('personal probability'). This is required in 
many applications of Bayes' theorem, which is mentioned in §§ 1.3 and 
2.4 (see also § 6.1, para. (7)). However the application of Bayes' theorem 
to medical diagnosis (§ 2.4) does not involve subjective probabilities, 
but only frequencies. 

2.3. Randomization and random sampling 

The selection of random samples from the population under study 
is the basis of the design of experiments, yet is an extraordinarily 
difficult job. Any sort of statistical analysis {and any sort of intuitive 
analysis) of observations depends on random selection and allocation 
having been properly done. The very fundamental place of randomiza- 
tion is particularly obvious in the randomization significance tests 
described in Chapters 8-11. 

It should never be out of mind that all calculations (and all intuitive 
assessments) belong to an entirely imaginary world of perfect random 
selection, unbiased measurement, and often many other ideal properties 
(see § 11.2). The assumption that the real world resembles this imagin- 
ary one is an extrapolation outside the scope of statistics or mathe- 
matics. As mentioned in Chapter 1 it is safer to assume that samples 
have some unknown bias. 

For example, an anti-diabetic drug should ideally be tested on a 
random sample of all diabetics in the world — or perhaps of all dia- 
betics in the world with a specified form and severity of the disease, 
or all diabetics in countries where the drug is available. In fact, what is 
likely to be available are the diabetic patients of one, or a few, 
hospitals in one country. Selection should be done strictly at random 
(see below) from this restricted population, but extension of inferences 
from this population to a larger one is bound to be biased to an un- 
known extent. 

It is, however, quite easy, having obtained a sample, to divide it 
strictly randomly (see below) into several groups (e.g. groups to receive 
new drug, old drug, and control dummy drug). This is, nevertheless, 



Copyrighted material 



§ 2.3 



Operations and definitions 17 



very often not done properly. The hospital numbers of the patients will 
not do, and neither will their order of appearance at a clinic. It is very 
important to realize that 'random' is not the same thing as 'haphazard'. 
If two treatments are to be compared on a group of patients it is not 
good enough for the experimenter, or even a neutral person, to allocate 
a patient haphazardly to a treatment group. It has been shown re- 
peatedly that any method involving human decisions is non-random. 
For all practical purposes the following interpretation of randomness, 
given by R. A. Fisher (1951, p. 11), should be taken as a fundamental 
principle of experimentation: ' . . . not determined arbitrarily by 
human choice, but by the actual manipulation of the physical apparatus 
used in games of chance, cards, dice, roulettes, etc., or, more ex- 
peditiously, from a published collection of random sampling numbers 
purporting to give the actual results of such manipulation.' 

Published random sampling numbers are, in practice, the only 
reliable method. Samples selected in this way (see below) will be re- 
ferred to as selected strictly at random. Superb discussions of the crucial 
importance of, and the pitfalls involved in random sampling have been 
given by Fisher (1951, especially Chapters 2 and 3) and by Mainland 
(1963, especially Chapters 1-7). Every experimenter should have read 
these. They cannot be improved upon here. 

How to select samples strictly at random using random number tables 

This is, perhaps, the most important part of the book. There are 
various ways of using random number tables (see, for example, the 
introduction to the tables of Fisher and Yates (1963)). Two sorts of 
tables are commonly encountered, and those of Fisher and Yates (1963) 
will be used as examples. The first is a table of random digits in which 
the digits from 0 to 9 occur in random order. The digits are usually 
printed in groups of two to make them easier to read, but they can be 
taken as single digits or as two, three, etc. digit numbers. If taken in 
groups of three the integers from 000 to 999 will occur in random order 
in the tables. The second form of table is the table of random permuta- 
tions. Fisher and Yates (1963) give random permutations of 10 and 20 
integers. In the former the integers from 0 to 9, and in the latter the 
integers from 0 to 19, occur in random order, but each number appears 
once only in each permutation. 

To divide a group of subjects into several sub-groups strictly at 
random the easiest method is to use the tables of random permutations, 
as long as the total number of subjects is not more than 20 (or whatever 



Copyrighted material 



18 Operations and definitions 



is the largest size of random permutation available). Suppose that 15 
subjects are to be divided into group of size n lt tig, and n$. First number 
the subjects 0 to 14 in any convenient way. Then obtain a random 
permutation of 15 by taking the first random permutation of 20 
from the tables and deleting the numbers 15 to 19. (This permutation in 
the table should then be crossed out so that it is not used again — use 
each once only.) Then allocate the first n 1 of the subjects to the first 
group, the next n a to the second group, and the remainder to the third 
group. For example, if the random permutation of 15 turned out to be 
1, 6, 8, 5, 10, 12, 11, 9, 2, 0, 3, 14, 7, 4, 13 (the first permutation from 
Fisher and Yates (1963), p. 142)) and the 15 subjects were to be divided 
randomly into groups of 5, 4, and 6 subjects then subjects 1, 6, 8, 5, and 
10 would go in the first group, 12, 11, 9, and 2 in the second group, and 
the rest in the third group. 

For larger numbers of subjects the tables of random digits must be 
used. For example, to divide 24 subjects into 4 groups of 6 the procedure 
is as follows. First number the subjects in any convenient way with the 
numbers 00 to 23. Take the digits in the table in groups of two. The 
table then gives the integers from 00 to 99 in random order. One 
procedure would be to delete all numbers from 24 to 99, but it is more 
economical to delete only 96, 97, 98, and 99 (i.e. those equal to or larger 
than 96, which is the biggest multiple of 24 that is not larger than 100). 
Now the remaining numbers are a random sequence of the integers 
from 00 to 95. From each number between 24 and 47 subtract 24; from 
each number between 48 and 71 subtract 48 ; and from each number 
between 72 and 95 subtract 72 (or, in other words, divide every number 
in the sequence by 24 and write down the remainder). For example, if 
the number in the table is 94 then write down 22 ; or in plaoe of 55 
write down 07. (The numbers from 96 to 99 must, of course, be omitted 
because their presence would give the numbers 00 to 03 a larger chance 
than the others of occurring.) Some numbers may appear several times 
but repetitions are ignored. If the final sequence were 21, 04, 07, 13, 02, 
02, 04, 09, 00, 23, 14, 13, 11, etc, then subjects 21, 04, 07, 13, 02, 09 are 
allocated to the first group, subjects 00, 23, 14, 11, etc. are allocated to 
the second group, and so on. 

The method is simpler for the random block experiments described 
in §§11.6 and 11.7. Blocks are never likely to contain more than 20 
treatments so the order in which the treatments occur in each blook is 
taken from a random permutation found from the tables of random 
permutations as above. For example, if there are four treatments in 



Copyrighted material 



§2.3 



Operations and definitions 19 



each block number them 0 to 3, and for each block obtain a random 
permutation of the numbers 0 to 3, by deleting 4 to 9 from the tabulated 
random permutations of 10, crossing out each permutation from the 
table as it is used. 

The selection of a Latin square at random is more complicated 
(see § 11.8). 

2.4. Three rules of probability 

The words and and or are printed in bold type when they are being 
used in a restricted logical sense. For our purposes E x and E a means 
that both the event E : and the event E a occur, and Ej or E 2 means that 
either E x or E a or both occur (in general, that at least one of several 
events occurs). More explanation and details of the following rules can 
be found, for example, in Mood and Graybill (1963), Brownlee (1966), 
or Lindley (1965). 

(1) The addition rule of probability 

This states that the probability of either or both of two events, E x 
and E a , occurring is 

P[Ej or E a ] = P[Ei]+ P[E 2 ]-P[E l and E a ]. (2.4.1) 

If the events are mutually exclusive, i.e. if the probability of E x and 
E a both occurring is zero, P[E, and E a ] = 0, then the rule reduces to 

P[Ej or E 3 ] = PIE,] + P[E a ] (2.4.2) 

the sum of probabilities. Thus, if the probability that a drug will 
decrease the blood pressure is 0-9 and the probability that it will have 
no effect or increase it is 0-1 then the probability that it will either 
(a) increase the blood pressure or (b) have no effect or decrease it is, 
since the events are mutually exclusive, simply 0*9-}- 0*1 = 1-0. 
Because the events considered are exhaustive, the probabilities add up 
to 1*0. It is certain that one of them will occur. That is 

P[E occurs] = 1 — P[E does not occur]. (2.4.3) 

This example suggests that the rule can be extended to more than two 
events. For example the probability that the blood pressure will not 
change might be 0-04 and the probability that it will decrease might be 
0-06. Thus 

P[no change or decrease] = 0-04+0-06 = 0 1 as before, 
P[no change or increase] — 0 04+ 0-9 = 0-94, 
P[no change or decrease or increase] = 0-04-}- 0-00+ 0-9 = 1-0. 



Copyrighted material 



20 Operations and definitions 



§ 2.4 



The simple addition rule holds because the events considered are 
mutually exclusive. In the last case only they are also exhaustive. An 
example of the use of the full equation (2.4.1) is given below. 

(2) The multiplication rule of probability 

It is possible that the probability of event E : happening depends on 
whether E a has happened or not. The conditional probability of Ex 
happening given that E a has happened is written PTE^E,], which is 
usually read 'probability of E x given E a '. 

The probability that both Ei and E a will happen is 

P[E X and E a ] = P^^PtE^] 

= PtE^.P^lE,]. (2.4.4) 

If the events are independent in the probability sense (different 
from the functional independence) then, by definition of independence, 

PtE^E,] = P[EJ 

and similarly P[E a |Ex] = P[E a ] (2.4.5) 
so the multiplication rule reduces to 

P[Ei and E a ] = P^J.PtEa], (2.4.6) 

the product of the separate probabilities. Events obeying (2.4.5) are 
said to be independent. Independent events are necessarily uncor- 
rected but the converse is not necessarily true (see § 12.9). 

A numerical illustration of the probability rules. If the probability 
that a British college student smokes is 0-3, and the probability that 
the student attends London University is 0-01, then the probability 
that a student, selected at random from the population of British 
college students, is both a smoker and attends London University can 
be found from (2.4.6) asf 

P[smoker and London student] = P[smoker] x P[London student] 

= 0-3 X 0-01 = 0003 

as long as smoking and attendence at London University are in- 
dependent so that, from (2.4.5 ), 

Pjsmoker] = P[smoker| London student] 

f Notice that P [smoker], could be written as P [smoker | British college student]. 
AU probabilities are really conditional (on membership of a specified population). See, 
for example, Lindley (1985). 



Copyrighted material 




Operations and definitions 21 



P[London student] = P[London student|smoker]. 



The first of these conditions of independence can be interpreted in 
words as 'the probability of a student being a smoker equals the 
probability that a student is a smoker given that he attends London 
University', that is to say 'the proportion of smoking students in the 
whole population of British college students is the same as the propor- 
tion of smoking students at London University', which in turn implies 
that the proportion of smoking students is the same at London as at 
any other British University (which is, no doubt, not true). 

Because smoking and attendance at London University are not 
mutually exclusive the full form of the addition rule, (2.4.1), must be 
used. This gives 

P[smoker or London student] = 0-3+0-01 — (0-3x0 01) = 0-307. 

The meaning of this can be made clear by considering random samples, 
each of 1000 British college students. On the average there would be 
300 smokers and 10 London students in each sample of 1000. There 
would be 3 students (1000 x 0-003) who were both smokers and London 
students if the implausible condition of independence were met (see 
above). Therefore there would be 297 students (300—3) who smoked but 
were not from London, and 7 students (10—3) who were from London 
but did not smoke. Therefore the number of students who either 
smoked (but did not come from London), or came from London (but 
did not smoke), or both came from London and smoked, would be 
297 + 7+3 = 307, as calculated (1000x0-307) from (2.4.1). 

(3) B ayes' theorem, illustrated by the problem of medical diagnosis 

Bayes' theorem has already been given in words as (1.3.1) (see 
§ 1.3). The theorem applies to any series of events H jt and is a simple 
consequence of the rules of probability already stated (see, for example, 
Lindley (1965, p. 19 et seq.)). The interesting applications arise when the 
events considered are hypotheses. If the jth hypothesis is denoted 
H f and the observations are denoted Y then (1.3.1) can be written 
symbolically as 



P[H,\Y] = kxP[Y\H,) x P[H,] , 



(2.4.7) 



posterior likelihood of prior probability 

probability hypotheBta / of hypothec / 

of hypothec/ 



Copyrighted material 



22 Operations and definitions 



§2.4 



where k is a proportionality constant. If the set of hypotheses con- 
sidered is exhaustive (one of them must be true), and the hypotheses are 
mutually exclusive (not more than one can be true), the addition 
rule states that the probability of (hypothesis 1) or (hypothesis 2) or 
. . . (which must be equal to one, beoause one or another of the 
hypotheses is true) is given by the total of the individual probabilities. 
This allows the proportionality constant in (2.4.7) to be found. Thus 

J,P[H,\Y] = J£(P[r|£T / ].P[fr / ]) = l and therefore 

alii 

-TOW 

•Hi 

B ayes' theorem has been used in medical diagnosis. This is an un- 
oontroversial application of the theorem because all the probabilities 
can be interpreted as proportions. Subjective, or personal, probabilities 
are not needed in this case (see § 2.2) 

If a patient has a set of symptoms S (the observations) then the 
probability that he is suffering from disease D (the hypothesis) is, 
from (2.4.7), 

P[D\S] = kx P[S\D]x P[D]. (2.4.9) 

In this equation the prior probability of a patient having disease D, 
P[D], is found from the proportion of patients with this disease in the 
hospital records. In principle the likelihood of D, i.e. the probability 
of observing the set of symptoms S if a patient in fact has disease D, 
P[S|D], could also be found from records of the proportion of patients 
with D showing the particular set of symptoms observed. However, if 
a realistic number of possible symptoms is considered the number of 
different possible sets of symptoms will be vast and the records are not 
likely to be extensive enough for P[S|D] to be found in this way. This 
difficulty has been avoided by assuming that symptoms are independent 
of each other so that the simple multiplication rule (2.4.6) can be 
applied to find P[S\D] as the product of the separate probabilities of 
patients with D having each individual symptom, i.e. 

P[S\D] = P[5 1 |Z>lxP[5a|2)]X...xP[5j2)] i (2.4.10) 

where S stands for the set of n symptoms (5 X and S 2 and . . . and 
S n ) and P^l-D], for example, is found from the records as the propor- 
tion of patients with disease D who have symptom 1. Although the 
assumption of independence is very implausible this method seems 



Copyrighted material 



§2.4 



Operations and definitions 23 



to have given some good results (see, for example, Bailey (1967, 
Chapter 11). 

A numerical example 

The simplest (to the point of naivety) example of the above argu- 
ment is the case when only one disease and one symptom is oonsidered. 
The example is modified from Wall is and Roberts (1956). 

Suppose that a diagnostic test for cancer has a probability of 0-96 
of being positive when the patient does have cancer. If S stands for the 
event that the test is positive and 8 for the event that it is negative 
(the data), and if D stands for the event that the patient has cancer, 
and D for the event that he has not (the two hypotheses) then in 
symbols P[S\D] — 0-96 (the likelihood of D if S observed). Because 
the test is either positive or not a slight extension of (2.4.3) gives 
P[S\D] = 1-P[S\D] = 0 04 (the likelihood of D if 8 is observed). The 
proportion of patients with cancer giving a negative test (false nega- 
tives) is 4 per cent. Suppose also that 95 per cent of patients without 
cancer give a negative test, P[S\D] = 0-95. Similarly P[S\3] 1—0-95 
= 0-05, i.e. 5 per cent of patients without cancer give a positive test 
(false positives). As diagnostic tests go, these proportions of false 
results are not outrageous. But now oonsider what happens if the test 
is applied to a population of patients of whom 1 in 200 (0-5 per cent) 
suffer from canoer, i.e. P[D] = 0-005 (the prior probability of D) and 
P[D] = 1—0-005 = 0-995 (from (2.4.3) again). What is the probability 
that a patient reacting positively to the test actually has cancer? 
In symbols this is P[D\S], the posterior probability of D after observing 
S, and from (2.4.7) or (2.4.9), and (2.4.8) it is, using the probabilities 
assumed above, 



P[D\S) = 



P[S\D].P[D] 



P[S\D).P[D]+PIS\DIP[D] 



0-96x0-006 



(0-96 X 0-006) + (005 x 0-995) 



0-0048 



0-0048+0-04975 



0-0048 



0-05455 



= 0-0880. 



(2.4.11) 



Copyrighted material 



24 Operations and definitions 



§2.4 



In other words, only 8-80 per cent of positive reactors actually have 
cancer, and 100—8-80 = 91-2 per cent do not have cancer. Not such 
a good performance. It remains true that 96 per cent of those with 
cancer are detected by the test, but a great many more without cancer 
also give positive tests. 

It is easy to see how this arises without the formality of Bayes' 
theorem. Suppose that 100000 patients are tested. On average 500 
(=100000 x 0-005) will have cancer and 99500 will not have cancer. 
Of the 500 with cancer, 500 x 0-96 — 480 will give positive reactions on 
average. Of the 99500 without cancer, 99500x0-05 = 4975 will give 
positive reactions on average (a much smaller proportion, but a much 
larger number than for the patients with cancer). Of the total number 
of positive reactors, 480 -f 4975 = 5455, the number with cancer is 
480 and the proportion with cancer is 480/5455 = 0-0880 as above. 
If these numbers are divided by the total number of patients, 100000, 
they are seen to coincide with the probabilities calculated by Bayes' 
theorem in (2.4.11). 

2.5. Averages 

If a number of replicate observations is made of a variable quantity 
it is commonly found that the observations tend to cluster round some 
central value. Some sort of average of the observations is taken as an 
estimate of the true or population value (see § 1.2) of the quantity that 
is being measured. Some of the possible sorts of average will be denned 
now. It can be seen that there is no logical reason for the automatic 
use of the ordinary unweighted arithmetic mean, (2.5.2). If the distri- 
bution of the observations is not symmetrical it may be quite inappro- 
priate, and nonparametric methods usually use the median (see §§ 
4.5, 6.2, and 7.3 and Chapters 9, 10, and 14). 



The arithmetic mean 

The general form is the weighted arithmetic sample mean (using the 
notation described in § 2.1), 

This provides an estimate, from a sample of observations, of the 
unknown population mean value of x (as long as the sample was taken 
strictly at random, see § 2.3). The population mean is the mean of all 



Copyrighted material 



§ 2.5 Operations and definitions 25 

the values of x in the population from which the sample was taken and 
will be denoted /z (see § A 1.1 for a more rigorous definition). 

The veeight of an observation. The weight, tc„ associated with the ith observa- 
tion, x t , ia a measure of the relative importance of the observation in the final 
result. Usually the weight is taken as the reciprocal of the variance (see § 2.6 and 
(2.7.12)), so the observations with the smallest scatter are given the greatest 
weight. If the observations are uncorrected, this procedure gives the best 
estimate of the population mean, i.e. an unbiased estimate with minimum variance 
(maximum precision). (See §§ 12.1 and 12.7, Appendix 1, and Brownlee (1965, 
p. 95).) Prom (2.7.8) it is seen that halving the variance is equivalent to doubling 
the number of observations. Both double the weight. See § 13.4 for an example. 

Weights may also be arbitrarily decided degrees of relative importance. For 
example, if it is decided in examination marking that the mark for an essay 
paper should have twice the importance of the marks for the practical and oral 
examinations, a weighted average mark could be found by assigning the essay 
mark (say 70 per cent) of a weight of 2 and the practical and oral marks (say 
30 and 40 per cent) a weight of one each. Thus 

. (2x70) + (lx30)-f(lx40) M 

X — ■ — ■ ■ = £HJ'£>, 

2 + 1 + 1 

If the weights had been chosen as 1, 0-5, and 0-5 the result would have been 
exactly the same. 

The definition of the weighted mean has the following properties. 

(a) If all the weights are the same, say w { = w, then the ordinary 
unweighted arithmetic mean is found; (using (2.1.5) and (2.1.6)), 

f - ^7 = ~ jT (2 - 5 - 2) 

In the above example the unweighted mean is ZxJN = (70 + 30+40)/3 
= 46-7. 

(b) If all the observations (values of x { ) are the same, then x has this 
value whatever the weights. 

(c) If one value of x has a very large weight compared with the 
others then x approaches that value, and conversely if its weight is zero 
an observation is ignored. 

The geometric mean 

The unweighted geometric mean of N observations is denned as the 
Nth root of their product (cf. arithmetic mean which is the Nth part of 
their sum). 

(i-N \IIN 
Uxj . (2.5.3) 



Copyrighted material 



26 Operations and definitions § 2.5 

It will now be shown that this is the sort of mean found when the 
arithmetic mean of the logarithms of the observations is calculated 
(as, for example, in § 12.6), and the antilog of the result found. 

Call the original observations z t , and their logarithms x u so 



x t = log 2, 



Sz 4 2(log z t ) 



Arithmetic mean of log z = x = (log z) = — = — — 

log(IIz ( ) because the sum of the logs is the log of the 
N product 

= log {V(n*,)} 

= log (geometric mean of z) 

or, taking antilogs, 

antilog (arithmetic mean of log z) = geometrio mean of z. (2.5.4) 

This relationship is the usual way of calculating the geometrio mean. 
For the figures used above the unweighted geometrio mean is the cube 
root of their product, (70x30x40) 1/3 = 43-8. The geometric mean 
of a set of figures is always less than their arithmetic mean, as in this 
case. If even a single observation is zero, the geometric mean will be 
zero. 

The median 

The population (or true) median is the value of the variable such that 
half the values in the population fall above, it and half above it (i.e. 
it is the value bisecting the area under the distribution ourve, see 
Chapters 3 and 4). It is not necessarily the same as the population mean 
(see § 4.5). The population median is estimated by the 



sample median = central observation. (2.5.5) 

This is uniquely defined if the number of observations is odd. The 
median of the 5 observations 1, 4, 9, 7, 6, is seen, when they are ranked 
in order of increasing size giving, 1, 4, 6, 7, 9, to be 6. If there is an even 
number of observations the sample median is taken half-way between 
two oentral observations; for example the sample median of 1, 4, 6, 7, 
9, 12 is 6}. 



Copyrighted material 



§2.5 



Operations and definitions 27 



The mode 

The sample mode is the most frequently observed value of a variable. 
The population mode is the value corresponding to the peak of the 
population distribution curve (Chapters 3 and 4). It may be different 
from the mean and median (see § 4.5). 

The arithmetic mean as a least squares estimate 

This section anticipates the discussion of estimation in 1 2.6, Chapter 12, and 
J A 1.8. The arithmetic mean of a sample is said to be a least squares estimate 
(see Chapter 12) because it is the value that best represents the sample in the 
sense that the sum of the squares of the deviations of the observations from the 
arithmetic mean, — £ ) a , is smaller than the sum of the squares of the deviations 
from any other value. This can be shown without using calculus as follows. 

Suppose, as above, that the sample consists of N observations, x lt Xg, x H . It 
is required to find a value of m that makes £(x,— m) a as small as possible. This 
follows immediately from the algebraic identity 

£(x, -m) a = £(x, - J) a + N(£ -m) a . (2.6.6) 

The values of m that minimizes this is clearly m — £, the arithmetic mean, 
because this makes the last term zero; as small at it can be. For the example 
following (2.6.2), the sum of squares 

I(x,-*) a = (70-46-7) a + (30-46-7) a + (40-48-7) 2 = 866-7, 

and a few trials will show that inserting any value other than 46-7 makes the sum 
of squares larger than 866-7. 

The intermediate steps in establishing (2.6.6) are easy. By definition of the 
arithmetic mean Nf = £sc„ so the right-hand side of (2.6.6) can be written, 
completing the squares and using (2.1.6), as 

E(x a -2x£ + x») + Nx 3 -2Axm + A T m a 
= Iir a -2xXx,+Ax*+Ax*-2A*xm+A T m a 
m 2*1-2Nx*+N£ i +N£ a -2Nxm+Nm a 
= Xx a -2A'xm+A T m 2 
- L(x?-2mx,+ m a ) = E(x,-m) a 

as stated in (2.6.6). 

Using calculus the same result can be reached more elegantly. The usual way of 
finding a minimum in calculus is to differentiate and equate the result to zero 
(see Thompson (1066, p. 78)). This process is described in detail, and illustrated, 
in Chapter 12. In this case E(x, — m) a is to be minimized with respect to m. 
Differentiating Z(x,— m) a = Lx a — 2mXx, f Am a , remembering that the x, are 
constants for a given sample, and equating to zero, gives 

^(Kx.-m) 3 ] = — 2£x t +2A T n» = 0. (2.6.7) 

Therefore 2Am = 2Lx, 

and m = Lz,/A = 2 

as found above. 



Copyrighted material 



28 Operations and definitions § 2.6 

2.6. Measures of the variability of observations 

When replicate observations are made of a variable quantity the 
scatter of the observations, or the extent to which they differ from 
eaoh other, may be large or may be small. It is useful to have some 
quantitative measure of this scatter. See Chapter 7 for a discussion 
of the way this should be done. As there are many sorts of average 
(or 'measures of location'), so there are many measures of scatter. 
Again separate symbols will be used to distinguish the estimates of 
quantities calculated from (more or less small) samples of observations 
from the true values of these quantities, which could only be found if 
the whole population of possible observations were available. 

The range 

The difference between the largest and smallest observations is the 
simplest measure of scatter but it will not be used in this book. 

The mean deviation 

If the deviation of each observation from the mean of all observations 
is measured, then the sum of these deviations is easily proved (and this 
result will be needed later on) to be always zero. For example, consider 
the figures 5, 1, 2, and 4 with mean = 3. The deviations from the 
mean are respectively +2, —2, —I, and 4-1 so the total deviation is 
zero. In general (using (2.1.6), (2.1.8), and (2.5.2)), 

2 ( Xi -x) = Tx t -Nx = Nx-Nx = 0. (2.6.1) 
i-i 

If, however, the deviations are all taken as positive their sum (or mean) 
is a measure of scatter. 

The standard deviation and variance 

The standard deviation is also known, more descriptively, as the 
root mean square deviation. The population (or true) value will be 
denoted a(x). It is defined more exactly in § A1.2. The estimate of the 
population value calculated from a more or less small sample of, say, 
N observations, the sample standard deviation, will be denoted s{x). 
The square of this quantity is the estimated (or sample) variance (or 
mean square deviation) of x, var(x) or s 2 {x). The population (or true) 



Copyrighted material 



§ 2.6 Operations and definitions 29 

variance will be denoted vat{x) or a 2 {x). The estimates are calculated as 



(2.6.2) 

var(ar) or s 2 {x) ss — — — • 



The standard deviation and variance are said to have N— 1 degrees of 
freedom. In calculating the mean value of (ar,— x) 2 , iV— 1, rather than JV, 
is used because the sample mean x has been used in place of the popula- 
tion mean pi. This would tend to make the estimate too small UN were 
used (the deviations of the observations from pi will tend to be larger 
than the deviations from x\ this can be seen by putting m — u in 
(2.5.6)). It is not difficult to show that the use of JV — -1 corrects this 
tendency (see § A1.3).t It also shows that no information about scatter 
can be obtained from a single observation if u is unknown (in this case 
the number of degrees of freedom, on which the accuracy of the estimate 
of the scatter depends, will be N — 1 = 0). If pi were known even a 
single observation could give information about scatter, and, as 
expected from the foregoing remarks, the estimated variance would be 
a straightforward mean square deviation using N not N — 1. 

*W~~N^ ' (2 ' 6 - 3) 



A numerical example of the calculation of the sample standard 
deviation is provided by the following sample of N = 4 observations 
with arithmetic mean x = 12/4 = 3. 



f If the 'obvious' quantity. A', were used as the denominator in (2.6.2) the estimate 
of a 3 would be biased even if the observations themselves were perfectly free of bias 
(systematic errors). This sort of bias results only from the way the observations are 
treated (another example occurs in $ 12.8). Notice also that this implies that the mean 
of a very large Dumber of values of E(x— f )*fN would tend towards too small a value, 
vis. v~*(x) x (N — \)}N, as the number of values, each calculated from a small sample, 
incr e ases; whereas the same formula applied to a single very large sample would tend 
towards \ at (x) itself as the size of the sample (N) increases. These results are proved in 
f Al.S. It should be mentioned that unbiasedness is not the only criterion of a good 
statistic and other criteria give different divisors, for example A r or N+ 1. 



Copyrighted material 



30 Operations and definitions 



§2.6 



x t x t —x {x x — xf 



Totals 



6 


+ 2 


+ 4 


1 


-2 


+4 


2 


-1 


+ 1 


4 


+ 1 


+1 


12 


0 


10 



<-4 

Thus J (x,-x) 2 = 10 so, from (2.6.2), s(x) = \/{l0fi) = 1-83. 

The coefficient of variation 

This is simply the standard deviation expressed as a proportion (or 
peroentage) of the mean (as long as the mean is not zero of course). 

six) 

C(x) = — (sample value), 




(population value), (2.6.4) 



where p is the population mean value of a; (and off). C(x) is an estimate, 
from a sample, of ¥{x). 

Whereas the standard deviation has the same dimensions (seconds, 
metres, etc.) as the mean, the coefficient of variation is a dimensionless 
ratio and gives the relative size of the standard deviation. If the scatter 
of means (see § 2.7) rather than the scatter of individual observations, 
were of interest C{x) would be calculated with s(x) in the numerator. 

In the numerical example above C{x) = s{x)jx = 1-83/3 = 0-61, or 
100C(ar) = 100x0-61 = 61 per cent. 

The working formula for the sum of squared deviations 

When using a desk calculating machine it is inconvenient to form 
the individual deviations from the mean and the sum of squared devia- 
tions is usually found by using the following identity. Using (2.1.6) and 
(2.1.8), 

E^-i) 3 = E(*J— = Zx* { -2xZx t +Nx 2 . 



Copyrighted material 



§2.6 

Now, since Ex, 



Nx, this becomes 



Operations and definitions 



31 



and thus 



N 



i-N <-N 



(I*i) 2 



(2.6.6) 



<-i <-i 



In the above example ExJ = 5 2 +l 2 +2 2 +4 2 = 46 and therefore 
Z(x,-x) 2 = 46-l2 2 /4 = 46-36 = 10, as found above. 

The covariance 

This quantity is a measure of the extent to which two variables are 
correlated. Uncorrelated events are denned as those that have zero 
covariance, and statistically independent events are necessarily un- 
oorrelated (though uncorrelated events may not be independent — see 



The true, or population, covariance of x with y will be denoted 
**v(x,y), and the estimate of this quantity from a sample of observa- 
tions is 



The numerator is called the sum of products. That the value of this 
expression will depend on the extent to which y increases with x is 
clear from Fig. 2.6.1 in which, for example, y might represent body 
weight and x calorie intake. Each point represents one pair of observa- 
tions. 

If the graph is divided into quadrants drawn through the point 
x, y it can be seen that any point in the top right or in the bottom left 
quadrant will contribute a positive term {x—x){y—$) to the sum of 
products, whereas any point in the other two quadrants will contribute 
a negative value of (x— x)(y— $). Therefore the points shown in Fig. 
2.6.2(a) would have a large positive co variance, the points in 
Fig. 2.6.2.(b) would have a large negative covariance, and those in 
Fig. 2.6.2(o) would have near zero covariance. 



§ 12.9). 



oov(x.y) » 



Z{x-x){y-y) 
N-l 



(2.6.6) 



♦ 



Copyrighted material 



32 Operations and definitions 



§2.6 



The working formula for the sum of products 

A more convenient expression for the sum of produots can be found 
in a way exactly analogous to that used for the sum of squares (2.6.5). 
It is 

S(*-f)(y-#) = E*y- V — £-2. (2.6.7) 



(x-x) is negative 
(y-y) id positive 



j (r — r) is positive 
' (y-y) is positive 



(* — i) is negative 
(y-y) is negative 



(x — x) is positive 
(y-y) •* negative 



l 1 1 

x x 

Fio. 2.6.1. Illustration of co variance. For eleven of the thirteen observations, 
the product (*,— *){y% — $) is positive; for the other two it la negative. 



lb) 



o 

OqO 



O o°o 



w o i 
_ _ o a_o 



i 
i 

— t — 



o 
o o 



o o , o 
o °'o°o 



o o I 



o o! o 



o o 



Fio. 2.6.2. 



Illustration of co variance : (a) positive covariance as in Fig. 2.6. 1 ; 
(b) negative covariance; (c) near aero covariance. 



Copyrighted material 



Operations and definitions 33 



2.7. What is a standard error? Variances of functions of the 
observations. A reference list 

Prediction of variances without direct measurement 

A problem that recurs continually is the prediction of the scatter of 
some function calculated from the observations, using the internal 
evidence of one experiment. 

Suppose, for example, that it is wished to know what degree of 
confidence can be placed in an observed sample mean, the mean, x, of a 
single sample of, say, N values of x selected from some specified popula- 
tion (see § 1.2). A numerical example is given below. The sample 
mean, x, is intended as an estimate of the population mean, u. How good 
an estimate is it ? The direct approach would be to repeat the experi- 
ment many times, each experiment consisting of (a) making N observa- 
tions (i.e. selecting N values of x from the population) and (b) calculat- 
ing their mean, x. In this way a large set of means would be obtained. 
It would be expected that these means would agree with each other 
more closely (i.e. would be more closely grouped about the population 
mean /*) than a large set of single observations. And it would be 
expected that the larger the number (N) of observations averaged to 
find each mean, the more olosely the means would agree with eaoh other. 
If the set of means was large enough their distribution (see Chapter 4) 
could be plotted. Its mean would be ji, as for the x values (= 'means of 
samples of size N — V), but the standard deviation of the population 
of x values, o(i) say, would be less than the standard deviation, o{x), 
of the population of x values, as shown in Fig. 2.7.1. The closeness with 
which the means agree with each other is a measure of the confidence 
that could be placed in a single mean as an estimate of /*. And this 
closeness can be measured by calculating the variance of the set of 
sample means, using the means {x values) as the set of figures to which 
(2.6.2) is applied, giving var(i), or ^(x), as an estimate of o*(x), as 
illustrated below. 

If (2.6.2) was applied to a set of observations (x values), rather than 
a set of sample means, the result would be var(x), an estimate of the 
scatter of repeated observations. As it has been mentioned that a set 
of means would be expected to agree with each other more olosely 
than a set of single observations, it would be expected that va*(x) 
would be smaller than v**(x), and this is shown to be so below ((2.7.8)). 
The standard deviation of the mean, s{x) = y'var(i), is often called the 
sample standard error of the mean to distinguish it from s(x), the sample 



Copyrighted material 



34 Operations and definitions § 2.7 

standard deviation of x, or 'sample standard deviation of the observa- 
tions'. This term is unnecessary and sample standard deviation of the 
mean is a preferable name for s{£). 

The sample standard deviation of the observations, s{x), is the 
quantity of interest if one wants to estimate the scatter of x values 
(single observations from the population). In other words, it measures 
the inherent variability of the population. Taking a larger sample 




Fio. 2.7.1. (a) Distribution of observations (x values) in the population. 
The area under the curve between any two x values is the probability of an 
observation falling between these two values, so the total area under the curve is 
l a 0 (see Chapter 4 for details). This particular distribution is Gaussian but the 
results in this chapter are valid for any distribution (though the standard devia- 
tion has a simple interpretation only when the distribution is Gaussian). The 
mean value of * is 4 0 and the standard deviation, o(x), is 1-0. (b) The distribution 
of £ values. £ is the mean of a sample of four x values from the population repre- 
sented in (a). The area under this curve must be 1*0, like the distribution in (a). 
To keep the area the same, the distribution must be taller, because it is narrower 
(i.e. the £ values have less scatter than the x values). The ordinate and abscissa 
are drawn on the same scale in (a) and (b). The mean value of £ is 4-0 and its 
standard deviation, a{£) (the 'standard error of the mean'), is 0-6. 

makes s{x) a more accurate estimate of the population value, a(x). 
On the other hand, the sample standard deviation of the mean, s(x ), 
is the quantity of interest if one wants to estimate the accuracy of a 
sample mean, x. It is used if the object of making the observations is 
to estimate the population mean, rather than to estimate the inherent 
variability of the population. Taking a larger sample makes s{£ ) smaller, 



Copyrighted material 



§2.7 



Operations and definitions 35 



on the average, because it is an estimate of o(x) (the population 
'standard error'), which is smaller than o{x). 

The standard deviation of the mean may sometimes be measured, for 
special purposes, by making measurements of the sample mean (see 
Chapter 1 1, for example). It is the purpose of this section to show that 
the result of making repeated observations of the sample mean (or of 
any other function calculated from the sample of observations), for 
the purpose of measuring their scatter, can be predicted indirectly. 
If the scatter of the means of four observations were required it could 
be found by observing many such means and calculating the varianoe 
of the resulting figures from (2.6.2), but alternatively it could be 
predicted using 2.7.8) (below) even if there were only four observations 
altogether, giving only a single mean. An illustration follows. 

A numerical example to illustrate the idea of the standard deviation of the 
mean 

Suppose that one were interested in the precision of the mean found 
by averaging 4 observations. It could be found by determining the 
mean several times and seeing how closely the means agreed. Table 
2.7. 1 shows three sets of four observations. (It is not the purpose of this 
section to show how results of this sort would be analysed in practice. 

Table 2.7.1. 

Three random samples, each with N = 4 observations, from a population 



with mean 


fi = 4-00 and standard 




o(x) = 1-00 


I 




Sample 2 


Sample 3 




x values 


3-39 


5-88 


3-79 






3-38 


2-45 


3-26 






3-89 


2 21 


407 






8-38 


596 


321 




Sample mean, 2 


440 


412 


3-58 


Grand mean = 4 04 


Sample standard 










deviation, *{x) 


1-33 


208 


0-420 





That is dealt with in Chapter 11.) The observations were all selected 
randomly from a population known (because it was synthetic, not 
experimentally observed) to have mean u = 4*00 and standard 
deviation o{x) = 1»00, as shown in Fig. 2.7.1(a) 



Copyrighted material 



36 Operations and definitions 



§ 2.7 



The sample means, z, of the three samples, 4-40, 4-12, and 3-58, are all 
estimates of ^ — 4-00. The grand sample mean, 4-04, is also an estimate 
of fx — 4-00 (see Appendix 1). The standard deviations, s{x), of each 
of the three samples, 1-33, 2 08, and 0-420, are all estimates of 
the population standard deviation, o{x) — 1*00 (a better estimate 
could be found by averaging, or pooling, these three estimates as 
described in § 11.4). The population standard deviation of the mean 
can be estimated directly by calculating the standard deviation of the 
sample of three means (4-40, 4-12, 3-58) using (2.6.2). This gives 



Now according to (2.7.9) (see below), if we had an infinite number of 
x values instead of only 3, their standard deviation would be 
a{x) = o{x)jy/N = 1-00/V4 = 0-5 (see Fig. 2.7.1(b)), and«(x) = 0-420 is 
a sample estimate of this quantity. And, furthermore, if we had only one 
sample of observations it would still be possible to estimate indirectly 
a(x) = 0-5, by using (2.7.9). For example, with only the first group, 
e(x) = s{x)l-\/N = 1-33/^4 = 0-665 could be found as an estimate of 
a{x) — 0-500, i.e. as a prediction of what the scatter of means would 
be if they were repeatedly determined. (This prediction refers to 
repeated samples from the same population. If the repeated samples 
were from different populations the prediction would be an under- 
estimate, as described in Chapter 11.) 

A reference list 

The problem dealt with throughout this section has been that of 
predicting what the scatter of the values of various functions of the 
observations (such as the mean) would be if repeated samples were 
taken and the value of the function calculated from each. The aim is 
to predict this, given a single sample containing any number of observa- 
tions that happen to be available (not fewer than two of course). 

The relationships listed below will be referred to frequently later on. 
The derivations should really be carried out using the definition of 
the population variance (§A1.2 and Brownlee (1966, p. 57)), but it 
will, for now, be sufficient to use the sample variance (2.6.2). The 
results, however, are given properly in terms of population variances. 
The notation was defined in §§ 2. 1 and 2.6. 



rn-Jf 



(4-40-4-04) a +(4-12-4-04) a +(3-58-4-0 4) : 

_ 



= 0-420. 



Copyrighted material 



§ 2.7 Operations and definitions 37 

Variance of the sum or difference of too variables. Given the variance of 
the values of a variable x, and of another variable y, what is the pre- 
dicted variance of the figures found by adding an x value to a y value ? 
From, (2.6.2), 



I[(*i+y t )-(x+y)y 

var(x+y) a J" 1 



Now, since (*+y) - Z^+yO/iV = Es./iV+Iy./tf = *+f (from 2.1.8), 
this can be rearranged giving 

var(*-fy) ^— 

S[(* ( -*) a + (y , -g) a + 2(*, -g)] 

tf-1 + N-l + xV-1 
suggesting, from (2.6.2) and (2.6.6), the general rule 

f**(x+y) = +2 "*>(*,y). (2.7.1) 



By a similar argument the varianoe of the difference between two 
variables is found to be 

vai(x—y) — v**(x)+vat{y) — 2 ***>(x,y). (2.7.2) 

Thus if the variables are unoorrelated, i.e. if *<w(x,y) = 0, then the 
varianoe of either the sum or the difference is simply the sum of the 
separate variances 

vart{x+y) = vat{x—y) = vat(x) + vat(y). (2.7.3) 

If variables are independent they are necessarily unoorrelated (see 
§§ 2.4 and 12.9), so (2.7.3) is valid for independent variables. 

Variance of the sum of N variables. By a simple extension of the above 
argument for two variables, if x lt x 2 , .... x N are N uncorrelated vari- 
ables then (2.7.3) can be generalized giving 

t>a*(Lz t ) = v *4{x x +x 2 +...+x N ) 

= va*(x x ) + va4{x 2 )-\-... + v**{x N ) 

= StuMfo), or Nv**(x), (2.7.4) 



Copyrighted material 



38 Operations and definitions § 2.7 

the seoond form being appropriate (of. (2.1.6)) if all the x t have the 
same variance, vm{x). 

The effect of multiplying each x by a constant factor. If a is a constant 
then, by (2.6.2), the variance of the set of figures found by multiplying 
each of a set of x values by the same figure, a, will be 

Z( «,h5)» S[ a(»,-i)r a a .S(*,-* V» 
variar; - tf-i .V-l 

suggesting the general rule 

va4{ax) — a 2 t>**(x); (2.7.5) 
and similarly from (2.6.6), 

cov(ax, by)=ab e*v{x,y) (2.7.6) 
where a and b are constants. 

TA« ejfiT«ec< of adding a constant to each x. By similar arguments to those 
above it can be seen, from (2.6.2), that adding a constant has no effeot 
on the scatter. 

va\{a+x) = va*{x). (2.7.7) 

The variance of the mean of N observations and the standard error. This 
relationship, the answer to the problem of how to estimate indirectly 
the scatter of repeated observations of the mean discussed, with a 
numerical example, at the beginning of this section, follows from those 
already given. 

vai(£) = «"»(^r) = ^^(^i) (from (2.7.5)) 

= §-2 VUi{x) (from (2.7.4)) 

and therefore the variance of the mean is 

vailx) 

v**(x) = —jj- (2.7.8) 

and the standard deviation of the mean (the standard error, see discussion 
above) is 

o(x) = yftvm*(x)\ = (2.7.9) 



Copyrighted material 



§2.7 



Operations and definitions 39 



Notice that var(x), being an average (like x), will be more or less the 
same whatever size sample is used to estimate it (though a larger 
sample will give a more precise value), whereas var(x) becomes smaller 
as the number of observations averaged increases, as expected from the 
discussion at the beginning of this section and from (2.7.8). 

The variance of a linear function of the obeervatione. A linear function of the 
observations x lt x^, x„ is defined as 

ii 

L = o« + o 1 x 1 + o a x a + ... + o m x Il m ao+ £ a,*, 

t»l 

where the a ( are constants. From (2.7.7) it can be seen that a 0 has no effect on the 
variance and can be ignored. Using (2.7.4) it can be seen that, if the observations 
are un correlated, 

vat(L) — vai(La&i) = Eva 4 (a (X,) 

and using (2.7.6) this becomes 

v*4{L) - E(o?v4H(x,)), (2.7.10) 

or va*(L) = v*4(z).Ea% 

if the variances of all the x, are the same, v«((z) say. 

If the x x do not have zero con variances (are not un correlated ) a more general 
form is necessary. Using (2.7.6), (2.7.6), and an extension of (2.7.1) this is found 
to be 

v a4(L) = Eo?wm(x,) + 2£ Tfifi, <•«*(*„ *,), (2.7.11) 

> i 

where the second term is the sum of all possible pairs of covariances. For example 
if L — a l z l + 03X3+03X3, then v**{L) = uj wm(xj) + o§ vo4(x 9 )+o£ v**(x 3 ) + 2a 1 a a 
*^i>(x 1 ,x fl ) + 2a 1 o s «H»(x 1 ^r 3 ) + 2oa0 3 cevfa^). 

The variance of the weighted arithmetic mean. The variance of the weighted mean, 
defined in (2.6.1), follows from (2.7.6) and (2.7.10). 

2}ts*v««(X|)] 

vail . •> ~ , 



(£tP(xA 1 



Now if to, = l/i'atd,), as discussed in } 2.6, then Eftp 3 vat{z t )] — Lie, so 



vat[—^\ = — , (2.7.12) 



and if all the weight (variances) are the same this reduces to (2.7.8). 

The approximate variance of any function. The variance of any function /(x 1( 
x-j, x„) of the uncorrelated variables Xj, x a , x„ is given approximately 
(taking only the linear terms In a Taylor series expansion of/) by 



►«<*„). (2.7.13) 



40 Operations and definitions 



§2.7 



if the variances are reasonably small relative to the means, so that the function 
can be represented approximately by a straight line in the range over which each 
x varies. The derivatives should be evaluated at the true mean value of the x 
variables. If the x variables are correlated then terms Involving their co variances 
must be added as shown below. For discussion and the derivation see, for example, 
Lindley (1065, p. 134), Brownies (1966, p. 144), and Kendall and Stuart (1963, 
p. 231). 

Iff is a linear function then (2.7.13) reduces to (2.7.10), which Is exact. If / is 
not linear then the result is only approximate and furthermore / will rot have the 
same distribution of errors as the x variables so if, for example, the x values were 
normally distributed (see { 4.2) / would not be normally distributed, so its vari- 
ance, even if it were exact, could not be interpreted in any simple way. 

The variance of log ^t. If the true mean value of x is u and then using the version 
of (2.7.13) for a single variable gives 

^(log^)-|^-^) 2 *<n(*) = - *»(*). (2.7.14) 

Therefore the standard deviation of log,* is approximately equal to the coefficient 
of variation of x, ¥(x), defined by (2.6.4). If the standard deviation of z increases 
in proportion to the true mean value of x, so that the coefficient of variation of x 
is constant, the standard deviation of log^r will be approximately constant 
(cf. if 11.2 and 12.2). 

The variance of the product of two variable*, x x x^. In this case an exact result 
can be derived for the variance of values of x x x^ % given the variances of x x and of 
r a . Suppose that x x and x a are independent of each other, and have population 
means u x and « a respectively. Thenf 

If this result is divided through by 0iiA*a) a . it can he expressed in terms of co- 
efficients of variation, defined in (2.6.4), as 

«*(*!*,) = hVfrJ+Vtot) +**<**). (2.7.16) 

It is interesting to compare this with the result of applying the approximate 
formula, (2.7.13), vie. 

,««(ff A )»^j^ v<r4(a!i) + (^) V4M( ** J 

= n\ va i(Xx) + u\ vaiix?)', 
or, again dividing through by (n^) 2 to get the result in terms of coefficients of 
variation, 



t Proof. From appendix equation (A 1.2.2), aMMfasjg) — E(*?x2) — [Efax^)]*. Now 
F(x x za) - uiM* *nd EfxJxJ) — E(*i)-E(*2) if, aa supposed, x, and Xg are independent. 
Also, from (A 1.2.2), E(*J) » va*(x l )-\-u x , and similarly for Thus 

«<(*A) - B(*?).E(x5)-^S 

= (f**(*i) + /4;)(v*4(X t )-f/4«)-/4>« 

= v+t(x l ). v »i(x t )+u\t>**(x l )+u*v»*(x % ) 

as stated above. 



Copyrighted material 



$ 2.7 Operations and definitions 41 

By comparison with (2.7.16) it appears that use of the approximate formula 
involves neglecting the term < € ri {x l )<t ri (x % ). The approximation involved can be 
illustrated by two numerical examples. 

First, suppose that both x x and * 2 have coefficients of variation of 60 per cent, 
Le. V{x,) - 0-6, *(**) = 0-6. In this case (2.7.16) gh 



<*{*x*%) - vt<©-*x<Wr*)+0-6"+0-«*] ~ V(0 0626 +0-264-0 26) - 0-760, 
i.e. 76 0 per cent. The approximate form gives 

*(*!**) ^ V(0-6 9 + 0-6*) = 0-707, 



Le. 70-7 per cent. Secondly, consider more accurate observation*, say 100 Vfo) 
=» 6 per cent and 100 V(*i) = 6 per cent. Similar calculations show that (2.7.16) 
gives 100 Vfaxa) = 7-076 per cent, whereas the approximate form gives 100 
V(*!Xa) est 7*071 per cent. The more accurate the observations, the better the 
approximate version will be. 

The variance of the ratio oftvao variables, Xi/xg. Using (2.7.13) gives, in terms of 
coefficients of variation, the approximate result 

^ta/xa) ~ «*(*,) +*"(**). (2.7.16) 

An exact treatment for the ratio of two normally distributed variables is given in 
5 1S.6 and exemplified in H 13.11-13.16. 

r*e variant* of the reciprocal of a variable, 1/*. According to (2.7.13), 

v«„ /lC ,«( d -^)^«.«„ - (2.7.17, 

The weight (see J 2.6) to be attached to a value of Ijz is therefore approximately 
proportional to the fourth power of x if va*{x) is constant ! This explains why 
plots involving reciprocal transformations may give bad results (see | 12.8 
for details) if not correctly weighted. 

Correlated variables. In the simplest case of two correlated variables, Xj and x?, 
the appropriate extension of (2.7.13) is 

»a*\f{z x ^))~[^j v**Uti)+ , '" , <^ + 2 (^|£) ">"<*i*,). (2.7.18) 

This relationship is referred to in § 13.6. For a linear function this reduces to the 
two variable case of (2.7.11). The n variable extension of (2.7.18) involves all 
possible pairs of x variables in the same way as (2.7. 1 1 ). 



Sum of a variable number of random variables. Let S denote the sum of a randomly 
variable number of random variables 

m 

i-i 

where »% are independent variables with coefficient of variation V(i ) and m is a 
random variable with coefficient of variation ¥{m). If each S is made up of a 



Copyrighted material 



42 Operations and definitions § 2.7 

random sample (of variable size) from the population of* values, then it la shown 
in § A 1.4 that the coefficient of variation of S is 

V(8) = y^^+^m)^ (2.7.19) 

where p m is the population mean value of m (size of the sample). This result is 
illustrated on p. 58. 



Copyrighted material 



3. Theoretical distributions: binomial 
and Poisson 



Thi variability of experimental results is often assumed to be of a 
known mathematieal form (or distribution) to make statistical analysis 
easy, though some methods of analysis need far more assumptions than 
others. These theoretical distributions are mathematical idealizations. 
The only reason for supposing that they will represent any phenomenon 
in the real world is comparison of real observations with theoretical 
predictions. This is only rarely done. 

3.1. The idea of a distribution 

If it were desired to discover the proportion of European adults who 
drink Scotch whisky then the population involved is the set of all 
European adults. From this population a strictly random sample 
(see § 2.3) of, say, twenty-five Europeans might be examined and 
the proportion of whisky drinkers in this sample taken as an estimate 
of the proportion in the population. 

A similar statistical problem is encountered when, for example, the 
true concentration of a drug solution is estimated from the mean of a 
few experimental estimates of its concentration. Although it is con- 
venient still to regard the experimental observations as samples from a 
population, it is apparent that in this case, unlike that discussed in the 
previous paragraph, the population has no physical reality but consists 
of the infinite set of all valid observations that might have been made. 

The first example illustrates the idea of a discontinuous probability 
distribution (it is not meant to illustrate the way in which a single 
sample would be analysed). If very many samples, each of 28 Euro- 
peans, were examined it would not be expected that all the samples 
would contain exactly the same number of whisky drinkers. If the 
proportion of whisky drinkers in the whole population of European 
adults were 0-3 (i.e. 30 per cent) then it might reasonably be expected 
that samples containing about 7 or 8 cases would appear more frequently 
than samples containing any other number because 0-3 X 25 = 7-5. 
However samples containing about 5 or 10 cases would be frequent, 
and 3 (or fewer) or 13 (or more) drinkers would appear in roughly 1 in 
20 samples. If a sufficient number of samples were taken it should be 



Copyrighted material 



44 Theoretical distributions 



§3.1 



possible to discover something approaohing the true proportion of 
samples containing r drinkers, r being any specified number between 
0 and 25. These figures are called the probability distribution of the 
proportion of whisky drinkers in a sample of 26 and since this propor- 
tion is a discontinuous variable (the number of drinkers per sample 
must be a whole number) the distribution is described as discontinuous. 
The distribution is usually plotted as a block histogram as shown in 
Fig. 3.4.1 (p. 62), the block representing, say, 6 drinkers extending 
from 6-5 to 6-5 along the abscissa. 

The second example, concerning the estimation of the true concentra- 
tion of a drug solution, leads to the idea of a continuous probability 
distribution. If many estimates were made of the same concentration 
it would be expected that the estimates would not be identical. By 
analogy with the discontinuous case just discussed it should be possible, 
if a large enough number of estimates were made, to find the proportion 
of estimates having any given value. However, since the concentration 
is a continuous variable the problem is more difficult because the 
proportion of estimates having exactly any given value (e.g. exactly 
12 /ig/ml, that is 1 2*00000000... /ig/ml) will obviously in principle be 
indefinitely small (in fact experimental difficulties will mean that the 
answer can only be given to, say, three significant figures so that in 
practice the concentration estimate will be a discontinuous variable). 
The way in which this difficulty is overcome is discussed in § 4.1. 

3.2. Simple sampling and the derivation of the binomial 
distribution through examples 

The binomial distribution predicts the probability, P(r), of observing 
any specified number (r) of 'successes' in a series of n independent trials 
of an event, when the outcome of a trial can be of only two sorts 
('sucoess' or 'failure'), and when the probability of obtaining a 'success' 
is constant from trial to trial. If the conditions of independence and 
constant probability are fulfilled the process of taking a sample (of n 
trials) is described as simple sampling. When there are more than two 
possible outcomes a generalization of the binomial distribution known 
as the multinomial distribution is appropriate. Often it will not be 
possible a priori to assume that sampling is simple and when this is so 
it must be found out by experiment whether the observations are 
binomially distributed or not. 

The example in this section is intended to illustrate the nature of the 
binomial distribution. It would not be a well-designed experiment to 



Copyrighted material 



§3.2 



Binomial and Poisdon 46 



test a new drug because it does not include a control or plaoebo group. 
Suitable experimental designs are discussed in Chapters 8-11. 

Suppose that n trials are made of a new drug. In this case 'one trial 
of an event* is one adininistration of the drug to a patient. After each 
trial it is recorded whether the patient's condition is apparently better 
(outcome B) or apparently worse (outcome W). It is assumed for the 
moment that the method of measurement is sensitive enough to rule 
out the possibility of no change being observed. 

The derivation of the binomial distribution specifies that the prob- 
ability of obtaining a success shall be the same at every trial. What 
exactly does this mean? If the n trials were all conducted on the same 
patient this would imply that the patient's reaction to the drug must 
not change with time, and the condition of independence of trials 
implies that the result of a trial must not be affected by the result of 
previous trials. The result would be an estimate of the probability of 
the drug producing an improvement in the single patient tested. 
Under these conditions the proportion of suooesses in repeated sets of n 
trials should follow the binomial distribution. 

At first sight it might be thought, because it is doubtless true that 
the probability of a success outcome B, will differ from patient to 
patient, that if the n trials were conducted on n different patients, 
the proportion of successes in repeated sets of n trials would not follow 
the binomial distribution. This would quite probably be so if , for 
example, each set of n patients was selected in a different part of the 
country. However, if the sets of n patients were selected strictly at 
random (see § 2.3) from a large population of patients, then the propor- 
tion of patients in the population who will show outcome B (i.e. the 
probability, given random sampling, of outcome B) would not change 
between the selection of one patient and the next, or between the 
selection of one sample of n patients and the next. Therefore the 
conditions of constant probability and independence would be met in 
spite of the fact that patients differ in their reactions to drugs. Notice 
the critical importance of strictly random selection of samples, already 
emphasized in § 2.3. 

From the rules of probability discussed in § 2.4 it is easy to find 
the probability of any specified result (number of successes out of n 
trials) if ^(B), the true (population) proportion of cases in which the 
patient improves, is known. This is a deductive, rather than inductive, 
procedure. A true probability is given and the probability of a particular 
result calculated. The reverse process, the inference of the population 



Copyrighted material 



46 Theoretical distributions 



§3.2 



proportion from a sample, is discussed later in § 7.7 and exemplified 
by the assay of purity in heart by the Oakley method, described in 
§7.8. 

Two different drugs will be considered. 

1. Suppose that drug X is completely inactive, but nevertheless 
50 per cent of patients, in the long run, improve spontaneously, i.e. 

^(B) = 0-5 and ^(W) = 1 -^(B) = 0-5. (3.2.1) 

2. Suppose that drug Y is effective, and that the percentage of 
patients improving in the long run is increased to 90 per cent. Thus 

0>(B) = 0-9 and ^(W) = l-^(B) = 0-1. (3.2.2) 

In both cases, because the outcomes B and W are mutually exclusive, 
the special case of the addition rule (2.4.2) gives .^<:B or W) — 
^(B)+^(W), and because B and W are exhaustive (the only possible 
outcomes) ^(B or W) = 1. 

Two trial administrations of the drug (n = 2) 

Out of two trials 0, 1, or 2 successes might be observed. The possible 
outcomes of the two trials are shown in Table 3.2.1 and from these 
probabilities, P{r), of observing r successes (r = 0, 1, or 2), are calculated 
using the multiplication rule, (2.4.6), and the addition rule (2.4.2). 



Table 3.2.1 



r 


1st 
trial 


2nd 
trial 


Prob. of 
outcome, ^(r) 


P{r) when 
&(B) = 0-6 


P{t) when 
3»(B) m 0 0 


0 
1 
1 

2 


w 

W 
B 
B 


W 
B 
W 
B 


^(W)X^(W) 
^(W) x^(B) 
^(B)x3»(W) 
^(B)x3»(B) 


0-26 
0-25) 
0*261 
0-26 


001 

o-oeJ 0 ' 18 

0-81 








Total 


10 


10 



It can be seen that 2 P{r) = 1*0 in each case, as it should by the 

f-0 

addition rule, because it is certain that r will take some value between 
0 and n. 

It is also clear from the table that the calculations are affected by 



Copyrighted material 



§3.2 



Binomial and Poisson 47 



the number of ways in which a given result can ocour. One success 
out of two can occur in two ways, either at the first trial or at the 
second, so the probability of one success out of two trials, if the order 
in which they occur is immaterial, is 0-6 when ^(B) = 0-5 (drug X), 
and 018 when £»(B) = 0-9 (drug Y). This follows from the addition 
rule, being the probability of either (B at first trial and W at second) 
or ( W at first trial and B at second). 

The mean number {expectation, see Appendix 1) of successes out of 
n trials is n^, i.e. 1 success out of 2 trials when ^(B) = 0-6 (drug X), 



I 

0 1 



mi- *» 

0 I X r 

Fx a. 3.2.1. Binomial distribution 
of r, the number of successes out of 
n — 2 trials of an event, the 
probability of success at each trial 
being 9 m 0 5. 




118, 



0-7 



5 0 6 
ft, 

0-6 



04 



13 0-3 
1 

I 

02 



01 




0 I 2 r 

Fio. 3.2.2. 
As in Fig. 3.2.1, but J» = 0 0. 



and 1-8 successes out of two trials when ^(B) = 0-9 (drug Y). The 
results in the table are plotted in Figs. 3.2.1 and 3.2.2. 

Three trial administrations of the drug (n — 3) 

Calculations, exactly similar to those above are shown in Table 
3.2.2 for the case when three trial administrations of the drug are 

3 



v i< tun! 



48 Theoretical distributions 1 3.2 

performed. In this case there are three possible orders in which one 
success may occur in three trials, and the same number for two successes 
in three trials. Check the figures in the table to make sure you have got 
the idea. 

These distributions are plotted in Figs. 3.2.3 and 3.2.4. 



M 

IT 



1)3 



o 



J 



01 



IMI 



1 1- 



I 



.1 



Pio. 3.2.3. Binomial distribution of r, the number of successes out of n — 3 
trials of an event, the probability of success at each trial being i? = 0-6. 



! 



07 r 



06 



OS 



04 



° 0-3 
>> 



02 



0 1 



00 




v. " 



O 1 | I r 

Flo. 3.2.4. As in Fig. 3.2.3 but & m 0-9. 



§3.3 



Binomial and Poisson 49 
Table 3.2.2 











P(r) 




P(t) when 


r 


1st 


2nd 


3rd 


^(B) 


m 0-6 


^(B) « 0-0 




trial 


trial 


trial 


(Drug X) 


(Drug Y) 


0 


W 


W 


W 


01 26 




0 001 


1 


w 


W 

WW 


B 

mm 


0126 




0-009} 


1 


w 


B 


W 


0126 


0-376 


0-009 0 027 


1 


B 


W 


W 


0126 




0-009J 


2 


B 


B 


w 


0126 




0081) 


2 


B 


W 


B 


0126 


0-376 


0081 0-243 


2 


W 


B 


B 


0126 




008lJ 


8 


B 


B 


B 


0126 




0-729 






Total 




1000 




1000 



3.3. Illustration of the danger of drawing conclusions from 
small samples 

Suppose that It is wished to compare the treatments, X and Y, used in the 
previous section (see (3.2.1) snd (3.2.2)). An experiment is performed by testing 
three subjects with treatment X and three subjects with treatment Y, the subjects 
being randomly selected from the population of subjects, and randomly allocated 
to X or Y using random number tables (see § 2.3). The probabilities of obtaining 
r successes hi each set of 3 trials have already been given in Table 3.2.2 and 
are reproduced in Table 3.3.1 together with the products which, by the multi- 
plication ruk, give the probabilities of observing both (r successes with X) and 
(r successes with Y). 



Table 3.3.1 



r 


P(r) when^(B) = 0-6 
(treatment X) 


P(r) when^(B) - 0-9 
(treatment Y) 


product 


0 


0126 


0001 


0000126 


1 


0-376 


0027 


0010126 


2 


0-376 


0-243 


0091126 


3 


0126 


0-729 


0018226 


Totals 


1000 


1000 


01 196 



The sum of the products, 0-1196, gives, by the addition rule, the probability 
of obtaining either (0 successes with both drugs) or (1 success with both) or 
(2 successes with both) or (3 successes with both). Thus in 1 1-96 per cent of experi- 
ments in the long run, treatment X will appear to be equi -effective with treatment 
Y, though in fact the latter is considerably better. 

Furthermore, in some experiments X will actually produce a better result than 



Copyrighted material 



50 Theoretical distributions 



§3.3 



T. By enumerating the ways in which this can happen, and applying the addition 
and multiplication rules, the probability of this outcome is seen to be 

(0-376 X 0 001 ) + 0-375(0 027 +0 001) + 0-125(0-243 +0-027 + 0 001 ) = 0 04476. 

For example, the second term is the probability of obtaining both (2 successes 
with X) and (either 0 or 1 successes with Y). The treatments will be placed in the 
icrong order of effectiveness in 4*476 per cent of trials in the long run. 

The result of these calculations is the prediction that in the long run X will 
appe&r to be as good as, or even better than Y in 11-06 + 4-476 = 16*4 per cent of 
experiments. It would thus be quite likely that a good new treatment would 
remain undetected if an experiment were conducted with samples as small as 
those in this illustration. The hazards of small samples are dealt with further in 
5 7.7 and in } 7.8, which describes the use of the binomial for the assay of purity 
in heart. 



3.4. The general expression for the binomial distribution and for 
its mean and variance 

The probability, P{r), of observing r successes out of n trials when the 
probability of a success is and the probability of a failure is therefore 
1— ^ from (2.4.3)), can be inferred by generalization of the deductions 
in § 3.2. It is 

P(r) = ^(1 (3.4.1) 

if the order in which the successes occur is specified. Commonly the 
order is of no interest, and therefore, by the addition rule, this must 
be multiplied by the number of ways in which r successes can ocour in 
n trials namely 

which is the number of possible combinations of r objects selected from 
n.f Thus, when the order of the successes ignored, 

The proof that the sum of these probabilities, for all possible values 
of r from 0 to n, is 1 follows from the fact that (3.4.3) is a term in the 

f This quantity u» often denoted by the symbol (,"), or by *C r . It is the number of 
possible ways of dividing n objects into two group* containing r and n— r objects ('sue 
ceases' and 'failures' in the present case). The n objects can be arranged in n I different 
orders (permutations), and in each case the first r selected for one group, the remaining 
n—r for the other. However the r! permutations of the objects within the first group, 
and the (n -r) ! permutations within the second group, all result into the asm 
into two groups, hence the denominator of (3.4.2). 



Copyrighted material 



§3.4 



Binomial and Poisson 51 



expansion where 2. = 1— ^, by the binomial theorem. Thus 

z*w = = i* = i. 

Example 1. If n = 3 and ? — 0-6, then the probability of one 
success (r = 1) out of three trials is, using (3.4.3), 

P(l) = — 0.5*0.5! » 3x0 125 = 0-375 

1 \i \ 

as found already in Table 3.2.2. 

Example 2. If n = 3 and & = 0-9, then the probability of three 
trials all being successful (r = 3) is , similarly, 

3! 

i>(3) = ^-jO-9 3 01° = 1 x 0-729 = 0-729 

as found in Table 3.2.2 (because 0! = 1, see § 2.1, p. 10). 

Estimation of the mean and variance of the binomial distribution 

When it is required to estimate the probability of a success from 
experimental results the obvious method is to use the observed propor- 
tion of successes in the sample, r/n, as an estimate of ^. Conversely, 
the average number of successes in the long run will be n& as exempli- 
fied in § 3.2 (this can be found more rigorously using appendix equation 
(Al.1.1)). 

If many samples of n were taken it would be found that the number 
of successes, r, varied from sample to sample (see § 3.2). Given a number 
of values of r this scatter could be measured by estimating their variance 
in the usual way, using (2.6.2). However, in the case of the binomial 
distribution (unlike the Gaussian distribution) it can be shown (see 
eqn (Al.2.7)) that the variance that would be found in this way can be 
predicted even from a single value of r, using the formula 

va*{r) = n^(l-<?) (3.4.4) 

into which the experimental estimate of J*, viz. r/n, can be substituted. 

The meaning of this equation can be illustrated numerically. Take 
the case of n = 2 trials when J 0 5, which was illustrated in § 3.2. 
The mean number of successes in 2 trials (long run mean of r) will be 
u = n& = 2 x 0-5 = 1. Suppose that a sample of 4 sets of 2 trials were 
performed, and that the results were r — 0, r — 1, r = 1, and r = 2 
successes out of 2 trials (that is, by good luck, the results were exactly 



Copyrighted material 



52 Theoretical distributions 



§3.4 



typical of the population, each of these values of r being equi probable, 
see Table 3.2.1). The variance of r oould now be estimated using 
(2.6.3). N = 4 is used in the denominator, not N - 1, because the 
population mean, p, not the sample mean, is being used — see § 2.6). It 
would be 

S(r-/i) a (0-lp+(l-lp+(l-l) , +(2-l)» 
var(r) = — -g — = = 0-5, 

which is exactly the result found using (2.4.4); thus 

va*(r) = n&(l-&) = 2x0-5(1-0-5) = 0-5. 




Fio. 3.4.1. Histogram. Binomial distribution with n = 26 and ^ = 0-3. 
Mean number of successes out of 26 trials = n& = 7-6. Variance of r, va*{r) 
= n&{\ -9) = 6-26; a(r) - ^(6-25) = 2-29; va4(rfn) = &(\ -9)\n = 0 0084; 
a(r/n) = v / ooo84 = 0 0017. Continuant distribution. Calculated Gaussian ('nor- 
mal') distribution with p = 7-6 and a = 2-29. 



' Cobyriqhted material 



3.5 



Binomial and Poisson 



53 



The agreement is only exact because the sample happened to be perfectly 
representative of the population. If the calculations are based on small 
samples the estimate of variance obtained from (3.4.4) will agree 
approximately, but not exactly, with the estimate from (2.6.2). A 
similar situation arises in the case of the Poisson distribution and a 
numerical example is given in §3.7. 

Results are often expressed as the proportion (r/n), rather than 
the number (r), of successes out of n trials. The variance of the propor- 
tion of suooesses follows directly from the rule (2.7.5) for the effect of 
multiplying a variable (r in this case) by a constant (1/n in this case). 
Thus, from (3.4.4), 



The use of these expressions is illustrated in Fig. 3.4.1 in which the 
abscissa is given in terms both of r and of r/n. As might be supposed 
from Fig. 3.2.1-3.2.4, the binomial distribution is only symmetrical if 
& = 0*5. However, Fig. 3.4.1 shows that as n increases the distribution 
becomes more nearly symmetrical even when & # 0«5. The binomial 
distribution in Fig. 3.4.1 is seen to be quite closely approximated by the 
superimposed continuous and symmetrical Gaussian distribution (see 
Chapter 4), which has been constructed to have the same mean and 
variance as the binomial. 

3.6. Random events. The Poisson distribution 

Genesis of the distribution. Relationship to the binomial 

The Poisson distribution describes the occurrence of purely random 
events in a continuum of space or time. The sort of events that may be 
described by the distribution (it is a matter for experimental observa- 
tion) are the number of oells visible per square of haemooytometer, 
the number of isotope disintegrations in unit time, or the number of 
quanta of acetylcholine released at a nerve-ending in response to a 
stimulus. The Poisson distribution is used as a criterion of the random- 
ness of events of this sort (see § 3.6 for examples). It can be derived in 
two ways. 

First, it can be derived directly by considering random events, when 
(3.5.1) follows (using the multiplication rule for independent events, 
(2.4.6)) from the assumption that events ocourring in non-overlapping 



vai(rjn) = 



n a n 



(3.4.5) 



Copyrighted material 



64 Theoretical distributions § 3.5 

intervals of time or space are independent. This derivation is given in 
§ A2.2 (Chapter 5 should be read first). The independence of time 
intervals is part of the definition of random events (see Chapter 5 and 
Appendix 2). 

Secondly, the Poisson distribution can be derived from the binomial 
distribution (§3.4). In the examples cited the number of 'successes' 
(e.g. disintegrations per second) can be counted, but it does not obviously 
make sense to talk about the 'number of trials of the event'. Consider 
an interval of time At seconds long (or an interval of space) divided into 
n small intervals. If the true (or population, or long-term) average 
number of events in At seconds is called **, then the probability of one 
event occurring ('success') in a small interval of length At/n is 9 
— m\n.\ Because of the independence of time intervals the n intervals 
are like n independent trials with a constant probability & = m\n of 
success at each trial, just like n tosses of a coin. These properties of 
independence and constancy define (plausibly enough) what is meant by 
'random'. If n is finite, the number of successes in n trials is therefore 
given by the binomial distribution, (3.4.3), with = mjn. In order to 
consider very short time intervals let n— »>oo (and thus ^-►0) in eqn 
(3.4.3), so that m = n& remains fixed. The result is (3.5.1), a limiting 
form of the binomial distribution in which neither n nor ^, but only m 
appears. The derivation is discussed by Feller (1957, p. 146), Mood and 
Graybill (1963, p. 70), and Lindley (1965, p. 73). It is easy to follow if it 
is remembered that as w-*oo, km (1 —mjn) n = e~ m . See Thompson (1965, 
Chapter 14) if it is not remembered. 

The distribution gives the true probability of observing r events per 
unit of time (or space) as 

mi 

P(r) = (3.5.1) 

where m is the true mean number of events per unit of time or space. 
(It is shown in Appendix 1, (Al.1.7), that m is the population mean 
value of r.) This is a discontinuous distribution because r must be an 

f You may object that m could be bigger than M, giving a probability bigger than 1 ! 
But the argument only applies to very short intervals so that m < n and the chance of 
more than one event occurring in a short interval (length A*/n) is negligible. For example, 
if A* = 1 hour (3600 s) and m = 38 event*/h, then if n = 3800 it follows that & = 38/ 
3600 = 001. On average, 99 out of 100 Is intervals contain no event ('failure'), 1 in 
100 contains 1 event ('success') and a negligible proportion contains more than one 
event. The 'negligible proportion' is dealt with more rigorously in Appendix 2. It be- 
comes «ero if the intervals are made infinitely short, which is why we let »-* oo in the 
derivation. 



Copyrighted material 



§ 3.5 Binomial and Poisson 55 

integer. It has the basic property of all probability distributions that it 
should be certain (P = 1) that one or other of the possible outcomes 
(r = 0, r = 1,...) will be observed. From the addition rule (2.4.2), 
this means that P[r = 0 or r = 1 or...oo] is the sum of the separate 
probabilities i.e., from (3.6.1), 

J P(r) - 2 ^7 = e-~(l+~+**7+...) = e'-e- = 1. (3.6.2) 

r-0 r-0 r! £ - 

(See Thompson (1966, p. 118) if you do not recognize the expansion of 
e~.) 



The variance of the Poisson distribution 

According to (3.4.4) the variance of the number of 'successes', r, for 
the binomial distribution, is vat(r) = n£P(l—J*). Because nt = n& 
this can be written *»(1— ^), and because, as discussed above, the 
Poisson distribution can be derived from the binomial by letting ►O, 
the variance of the Poisson becomes simply 

vai(r) = m, (3.5.3) 

the same as the mean. As in the case of the binomial distribution, 
but not the normal distribution, this allows an estimate of variance 
to be made with even a single observation of r (a single estimate of m), 
as well as by the conventional method of estimation. This is illustrated 
numerically in § 3.7. 

3.6. Some biological applications of the Poisson distribution 

Cell distribution 

If the number of cells per unit area of a counting chamber were 
observed to be Poisson-distributed this would imply that the cells 
were independent and randomly distributed, for example that they 
have no tendency to clump. 

Thus, if the number of red cells present in the volume represented by 
one small square of a haemocytometer is r, and the number of squares 
observed to contain r cells is /, then, using the observations in Table 
3.6. 1, the estimated mean number of cells per square is the total number 
of cells divided by the total number of squares, i.e. 

2/r 531 

f = -J- = — - 6-625. 



Copyrighted material 



56 Theoretical distributions 



§3.6 



The Poisson distribution (3.5.1) gives the probability of a square 
containing r cells as P{r) = e-^m^rl, where m, the mean number of 
cells per square, is estimated by r. For example, the probability of a 
square containing 3 cells is predicted to be 

(6-62S) 3 

P(3) = e- 6 625 ' ~0 064. 

o ! 

Multiplying this probability by the number of squares counted (80) 
gives the predicted frequency (/do) of squares containing 3 cells, viz. 
80x0*064 = 5 1, i.e. about 5 squares. The rest of the values are given 
in Table 3.6.1. The observed distribution is slightly more dumped 
than the calculated Poisson distribution. In § 8.6, p. [133], a test is 
carried out to see whether this tendency can be reasonably be attributed 
to random errors. For this purpose some categories are pooled as indi- 
cated by the brackets in Table 3.6.1. 

Table 3.6.1 



r obs. fireq. (J) calc. freq. fr 



0 


°1 








0 


1 

2 


0 
1 


4 




1- 


0 
2 


3 


sJ 








9 


4 


6 




e 




20 


6 


10 




n 




60 


6 


15 




18 




90 


7 


20 




12 




140 


8 


17 




10 




136 


9 


e 




7 




64 


10 


3 




6 




30 


11 
12 


S 




a 




0 
0 


or mor t 












Totals 


80 




80 




631 



Bacterial dilutions 

If samples of a dilute suspension of bacteria are suboultured into 
several replicate tubes then bacterial growth will result in those tubes 
in which the added sample contained one or more viable bacteria. The 
proportion of tubes showing growth is therefore an estimate of the 
probability that a sample contains one or more organisms, P(r>l). 
// the bacteria in the sample suspensions u ere randomly and independently 



Copyrighted material 



§ 3.6 Binomial and Poisson 57 

distributed throughout the suspending medium the number of bacteria 
in unit volume of solution (r) would follow the Poisson distribution; 
this enables an estimate of the mean number of cells per sample (*•) 
to be made from the observed proportion of subcultures showing 
growth (P(r>l) ~ p, say). 

Prom (3.6.1) the probability of the sample being sterile (r =: 0) 
is P(0) = e-~ and therefore, by (2.4.3)(cf. (3.6.2)), 

p = P(r>l) = 1-P(0) = 1-e-". 

By solving this for m the mean number of viable organisms per sample 
is estimated to be 

"»= -log.(l— p) 

(remember log.e* as x) . For example, if 40 per cent of cultures are non- 
sterile, p is 0*4 and m = — log t (l— 0»4) = 0-51 organisms per sample. 
The error of this estimate depends on the number of subcultures on 
which the estimate of p is based and is usually quite large. 



The quanta! release of acetylcholine at nerve terminals 

In a low-calcium, high- magnesium medium the muscle end-plate (or 
post-synaptio) potential elicited by nerve stimulation is reduced in size 
because the number of quanta of acetylcholine released is reduced. A 
certain proportion of stimuli produoe no response at all ('failures'). 

The number of quanta of acetylcholine released per stimulus has 
been found to be Poisson distributed (see Martin 1066). In other 
words, the proportion of stimuli causing release of r quanta, P(r), is 
observed to be predicted well by (3.5. 1 ). This is illustrated by an example 
given by Katz (1966). The mean response to a single quantum (mean of 
78 spontaneous miniature end-plate potentials) was 0-4 mV. The mean 
of the responses to 108 nerve impulses was 0-933 mV, the individual 
responses tending to be either zero ('failures', r = 0) or integer 
multiples of 0*4 mV corresponding to the release of an integral number 
(r) of quanta. Assuming that the response (mV) is proportional to the 
number of quanta released, the mean number released is estimated to 
be m = 0-933/0-4 = 2-33 quanta per stimulus. The proportion of 
stimuli releasing r quanta is therefore predicted, from (3.5.1), to be 
2-33 r e~ a - 33 /r ! The predicted number of impulses out of 198 releasing r 
quanta is simply 198 times this proportion. The results in Table 3.6.2 
show that the Poisson prediction agrees well with observations. 



Copyrighted material 



68 Theoretical distributions 

Table 3.6.2 



§3.6 



Comparison of observed and Poisson distributions of the number of quanta 
of acetyl choline released per stimulus (Katz, 1966; based on Boyd and 

Martin, 1956) 



r predicted frequency observed 

number of 198m r e"" , /r! frequency 

quanta 



0 


19 


18 


1 


44 


44 


2 


62 


65 


3 


40 


36 


4 


24 


25 


6 


11 


12 


8 


5 


6 


7 


2 


2 


8 


1 


1 


9 


0 


0 


• 


■ 


• 



• * • 

Total 198 198 



The predicted frequencies are only approximate because the observed 
mean m has been substituted for the population mean, m, in (3.5.1). 

The observed frequencies also are only approximate because the response to a 
single quantum is itself quite variable (standard deviation 0 086 mV, coefficient 
of variation 100 x 0 086/0-4 = 21-6 per cent, so the responses (in mV) to 0, 1, 2 
. . . quanta overlap somewhat. Also, if the response is large, the depolarization 
(In mV) is no longer directly proportional to the number of quanta. The details 
are discussed by Martin (1966) and Katz (1966). When corrections are made for 
these factors the observed distribution of responses (in mV) is fitted closely by 
the calculated distribution. 

Furthermore, assuming a Poisson distribution of r, m can be estimated 
from the observed number of failures, viz. 18 (from Table 3.6.2), because 
P(Q) = e-~ (from 3.5.1)). Thus m = -log e P(0) = log,l/P(0) = log. 
198/18 bs 2*4 quanta per stimulus, agreeing quite well with the 
independent estimate 2-33 quanta, which was found above without 
assuming a Poisson distribution. 

Estimation of the quantal content, m, by the 'coefficients of variation method' 

If the depolarization produced by a single quantum (miniature end plate 
potential) is denoted z, and the quantal content is m as above, then the end plate 



Copyrighted material 



§ 3.6 Binomial and Poisson 59 

potential can be represented (when S is small enough for the x, to be additive) by 



S = (3.6.1) 
l-l 

which is the sum of a variable number (m) of random variables (z). It is stated in 
(2.7.19), and proved in § A1.4, that if the miniature end-plate potentials are 
independent of each other (which is probably so), and if the end-plate potential, £, 
is produced by a random sample (of variable site, m) from the population of 
single quanta (which is less certain), then the square of coefficient of variation 
of the end-plate potential size is given by 

<**(£) = — — + < r , (m), (3.6.2 ) 

where «•*(*) and V(m) are the population coefficients of variation of z and m 
defined in (2.6.4). This result does not depend on assuming any particular distri- 
bution for either form (see § A 1.4). 

Suppose, for example, that m is binomially distributed, which might be expected 
if the nerve impulse caused there to be a constant probability 9 of releasing each 
of a population of A r quanta, so the true mean number of quanta released is 
m m N& as in 1 3.4, and, on average, a proportion & of the population is released. 
According to (3.4.4), t>a*(m) - N&(\ = m(\-&) and therefore V(m) 
= v<M(m)/*i a - (1 -&)\m*. Substituting this into (3.6.2) gives 

^_25!> + <izf>. ,3.6.3, 

Solving for m, gives, in this case of binomial distribution of m, 

«*(*)+ l-g» 



The case where m is Poisson-distributed is obtained when & tends to zero (a 
5 3.5), or directly from (3.6.2) using uat(m) = m from (3.5.3), i.e. «*(*) = lfm, 
giving 

- - -iw (3 - 6 - 5) 

This, and the other results in this section, are discussed in the review by Martin 
(1966). An estimate of m is obtained by substituting the experimental estimates of 
?(z) and ?(£) into (3.6.6). 

Equations (3.6.4) and (3.6.5) do not entirely account for the experimenta 
observations and it was pointed out by del Castillo and Katz (see Martin 1966) 
that if we drop the rather unreasonable assumption that all the quanta have the 
same probability of release, then ?*(m) will be less than the binomial value 
{l—&)/m, which in turn is less than the Poisson value, lfm. It can be shown 
(e.g. Kendall and Stuart (1963, p. 127)) that if each quantum has a different 
probability (^,) of release, and that if these probabilities are constant from one 

nerve impulse to the next, then ^(m) = (1 -^-uai(^)/^)/»i where & is the 
mean probability of a quantum being released (i.e. T&JN) and is the 



Copyrighted material 



60 Theoretical distributions § 3.6 

variance of the 9 i values {zero in the binomial case when all the & are identical). 
If this is substituted in (3.6.2), solving for m gives 

<€*{z) + l-?-v<ii(P)l& 
«W) (3 - 6 - 6) 

which is smaller than 1b given by either (3.0.4) or (3.6.5). In the case where N 
is very large (3.6.6), like (3.6.4), tends to the Poisson form (3.6.5) despite the 
variability of 

As an example consider the observations discussed above. The observed values 
for the response to one quantum was 2 = 0-4 mV with standard deviation 0-080 
mV, Le. coefficient of variation C(s) = 0-080/0-4 = 0-215. The observed mean end- 
plate potential was S = 0 033 mV with a standard deviation of 0-634 mV (this 
value is taken for the purposes of illustration, the original figure not being 
available) and hence C(S) = 0 034/0 033 m 0-680. If m were Poisson-distributed 
its mean value could be estimated from (3.6.5) as 

_ _ 0-215 a +l _ 1046 _ 

m ~ 0-680 3 ~~ 0-462 ~ ' ' 

which agrees quite well with estimate (viz. 2.4) from the proportion of failures, 
which also assumes a Poisson distribution, and the direct estimate 0-033/0-4 
= 2-33 which does not. 

The number of spontaneous miniature end-plate potentials in unit time 

The number of single quanta released in unit time is observed to 
follow the Poisson distribution, i.e. quanta appear to be released 
spontaneously in a random fashion. This phenomenon is discussed in 
Chapter 6, after continuous distributions have been dealt with, so 
the continuous distribution of intervals between random events can be 
discussed. 

3.7. Theoretical and observed variances: a numerical example 
concerning random radioisotope disintegration 

The number of unstable nuclei disintegrating in unit time is observed 
to be Poisson-distributed over periods during which decay is negligible 
(see Appendix A2.5), and disintegration is therefore a random process 
in time. 

Since the variance of the Poisson distribution (3.5.3) is estimated 
by the mean number of disintegrations per unit time, the uncertainty 
of a count depends only on the number of disintegrations counted and 
not on how long they took to count, or on whether one long count or 
several shorter counts were done. The example is based on one given by 
Taylor (1967). The values of x listed are n = 10 replicate counts, each 



Copyrighted material 



$ 3.7 Binomial and Poisson 61 

over a period of 5 min, of a radioactive sample. The decay over the 
period of the experiment is assumed to be negligible. The x values are 

10536 10636 10398 10393 10686 
10381 10479 10401 10262 10403 

The total number of counts is = 104476 oounts/60 min, mean count 
x = Xxjn = 10447*5 counts/5 min, and count-rate = 10447*5/6 = 
2089*5 oounte/min. What is the uncertainty in the count-rate ! Its 
variance can be calculated in two ways. 

(a) Theoretical Poisson variance 

The number of counts observed in a 50 min unit of time was 104476 
so if the number of counts in unit were Poisson -distributed the estimate 
of the variance of the variable 'number of counts in 60 minutes' would 
be 104475 (from (3.5.3)). In this case the total number of counts was the 
sum of ten 5-min counts. In general, according to (2.7.4), va*(Lx) — n 
va4{x), so if x is the number of counts in 6 min, the variance of a 
single 5-min count is estimated to be 

var(Lx) 104475 

var(z) = — = — — = 10447*5. 

n 10 

If there had only been one 5-min count, say the first one, its variance 
would have been estimated as 10536, a similar figure. 

However, what is really wanted is the variance of the count-rate 
per minute, determined from 50 min of counting in the experiment, 
not the variance of a 5-min count. The count-rate is Zx/50 oounte/min. 
In general, from (2.7.5), va*{ax) = a 2 va4(x), where o is a constant 
(1/50 in this case), therefore 



fLx\ var(Lz) 104475 
W U) = -5^- == -5^= 41 ' 79 - 



The standard deviation of the mean count-rate (2089*5 counts/min) is 
therefore \/(41*79) =» 6*46 counts/min. 

If there had been only a single 5-min count, say 10536, the mean 
count-rate would have been 10536/5 = 2107*2 counts/min, and, by a 
similar argument, its estimated standard deviation would have been 
\/(10536/5 a ) = 20*6 counts/min. Thus when the number of observa- 
tions is reduced tenfold, the standard deviation of the mean goes up by 
V(10), as expected from (2.7.9) (6.46 x \/( 10 ) = 2 <>*4). 

It can be seen that the uncertainty in the count depends only on the 



Copyrighted material 



62 Theoretical distributions §"3.7 

total number of counts. If it is known that the count-rate has a Poisson 
distribution (as it will have if the counter is functioning correctly) its 
uncertainty can be estimated without having to do replicate observa- 
tions. 



Observed variance 

In this particular case there are replicate counts so the variance of 
an observation (a 5-min count) can be estimated in the usual way 
using (2.6.2), 

S(x-~s) a 111938 

var(x) = — ~ = — - — = 12437 5. 

n— 1 9 

This is quite close to the estimate of 10447-5 found above. Because 
there are ten 6-min counts the estimate of count-rate will be based on 
the mean of these, the variance of which is estimated, using (2.7.8), to be 

var(z) 12437-6 

var(x) = — = — — — = 1243-75. 

n 10 

And the variance of the mean count-rate per minute will be, from 
(2.7.5), 



0- 



. , var(x) 1243-75 
var( f = = 2fi = 49-75. 



By using the scatter of replicate counts, the standard deviation of 
the count-rate (2089-5 counts/min) is therefore estimated to be 
\/49-76 = 7-05 counts/min. This estimate, which has not involved any 
assumption about the distribution of the observations, agrees well with 
the estimate (6-46 counts/min) calculated assuming that the count-rate 
was Poisson-distributed. This suggests that the assumption was not 
far wrong. With either estimate the coefficient of variation of the 
count -rate, by (2.6.4), comes out to about 0-3 per cent. 



The effect of allowing for background count-rale 

Counting equipment registers a background rate even when there 
is no sample in it and this must be subtracted from the sample count- 
rate. There is uncertainty in the background count as well as the 
sample count and this must be allowed for. 

To illustrate what happens when the sample count-rate is not 
much above the background rate suppose that 20 000 background 



Copyrighted material 



§3.7 



Binomial and Poisson 63 



counts were recorded in 10 min. The net count is thus 2089-5—2000 
= 89-5 oounts/min. 
By arguments similar to those above : 

estimated variance of background oount/min = var(oount/10) = var 
( count)/ 10 2 = count/10 2 = 20000/100 = 200. 

The estimated variance of the net count-rate (sample minus back- 
ground) is required. Because the counts are independent this is, by 
(2.7.3), the sum of the variances of the two count-rates 

var(sample— background) = var(sample)+var(background) 

= 49-75+200 = 249-75, 

and the estimated standard deviation of the net count-rate (89*5 
counts/min) is therefore \/(249-75) = 15-8 oounts/min. The coefficient 
of variation of the net count, by (2.6.4), is now quite large (17-7 per 
cent), and if the net count had been much smaller the difference be- 
tween sample and background would have been completely swamped 
by the counting error (for a fuller discussion see Taylor (1967)). 



6 



Copyrighted material 



4. Theoretical distributions. The 
Gaussian (or normal) and other 
continuous distributions 



'When it was first proposed to establish laboratories at Cambridge, Todhunter, 
the mathematician, objected that it was unnecessary for students to see experi- 
ments performed, since the results could be vouched for by their teachers, all of 
them men of the highest character, and many of them clergymen of the Church of 
England.' 

BKRTRAND RU88BLL 1931 

(The Scientific Outlook) 



4.1. The representation of continuous distributions in general 

So far only discontinuous variables have been dealt with. In many 
oases it is more oonvenient, though, because only a few significant 
figures are retained, not striotly correct, to treat the experimental 
variables as continuous. For example, ohanges in blood pressure, 
muscle tension, daily urinary excretion, etc. are regarded as potentially 
able to have any value. The difficulties involved in dealing with 
this situation have already been mentioned in § 3.1 and will now be 
elucidated. 

The discontinuous distributions so far discussed have been repre- 
sented by histograms in which the height (along the ordinate) of the 
blocks was a measure of the probability or frequency of observing a 
particular value of the variable (along the absoissa). However, if one 
asks 'What is the probability (or frequency) of observing a muscle ten- 
sion of exactly 2-0000 . . . g ?', the answer must be that this probability 
is infinitesimally small, and cannot therefore be plotted on the ordinate. 
What can sensibly be asked is 'What is the probability (or frequency) 
of observing a muscle tension between, say, 1*5 and 2-5 g?\ This fre- 
quency will be finite, and if many observations of tension are made a 
histogram can be plotted using the frequency (along the ordinate) of 
making observations between 0 and 0-5 g, 0*5 and 1*0 g, etc., as shown 
in Fig. 4.1.1. If there were enough observations it would be better to 
reduce the width of the classes from 0-6 g to, say, 0-1 g as shown in 
Fig. 4.1.2. This gives a smoother-looking histogram, but because there 



Copyrighted material 



§4.1 



The Gaussian and other continuous distributions 



65 



are more classes the number (or probability) of observations falling 
in a particular class will be reduced. The blocks will also be drawn 
narrower though it will usually be convenient to keep them about the 



u-4 



Ml 



0-0 0-5 10 1-5 2-0 

Muscle tension (grams) 



3li 



3 



Fzo. 4.1.1. Histogram of muscle tensions; 'observations' grouped into 
1 0*6 g wide and proportion of observations in each class plotted ss ordinate. 
Total height of all blocks m 10. 



on* 



our. 



0-0*2 



• H*. 



Or, 



1-0 



I "> S O 2 5 

MiiMclr tension (gram*) 



:-<i 



35 



FlO. 4.1.2. Same 'observations' as Pig. 4.1.1 but grouped into narrower 
classes (0-1 g), showing shape of distribution more clearly. Total height of all 

blocks — 10 as before. 



same height, as shown. This suggests that it might be more convenient 
to represent the probability of an observation falling in a particular 
class by the area of the block rather than its height. If the width of 
the block (class interval) is constant then the area of the block is 
proportional to its height so in ordinary histograms the area is in faot 



66 Theoretical distributions §4.1 

proportional to the probability. When the class width is reduoed, the 
reduction in width of the blocks will reduce their area, and henoe the 
probability they represent, without having to reduoe their height. An 
example of a histogram with unequal block widths occurs in §§ 14. 1- 
14.3. Fig. 14.2.3(a) shows it drawn with height representing probability 
and Fig. 14.2.3(b) shows the preferable representation of the histogram 
with area representing probability. 

Using the convention that the probability is represented by the area 
of the blocks rather than their height, the condition that the sum of all 
the probabilities must be 1-0 is expressed by denning the total area of 
the blocks as 1-0 (see (3.5.2) for example). 




MuBcle tenaion (grams) r 

Pi o . 4 . 1 . S . Continuous distribution of muscle tensions. Ordinate is probability 
density, /(x), i.e. a function such that the area under the curve represent* 
probability. The total area under the curve is 10. The probability of observing 
a value equal to or leas than z is denoted p, or F[z) and la the area under the curve 

up to z. 

If the class intervals are made narrower and narrower, the probability 
attached to each becomes smaller. The probability of observing a 
muscle tension (z) between 1 009 and 2 001 g is small, and a very 
large number of observations would be necessary to use intervals as 
narrow as this. When the class interval becomes infinitesimally narrow 
then the probability represented by each block (i.e. the probability 
that x will lie within the interval dar) is also infinitesimal, say dP, and 
the graph becomes continuous instead of being made up of finite 
blocks, as shown in Fig. 4.1.3. It now represents an infinite population 
and it can never be observed exactly. It is a mathematical idealization. 



' Copyrighted material 



§ 4.1 The Gaussian and other continuous distributions 67 

The area of a block, i.e. the probability of x falling in the interval of 
width dr (between x and x+dx), must now be written 

dP=/(*)d*, (4.1.1) 

where the function f(x) is the ordinate of the curve shown in Fig. 4.1.3 
(i.e. the height of the block), and x is the continuous variable (e.g. 
blood pressure or muscle tension) the distribution of which is being 
defined (see, e.g., Thompson, 1965, if the notation of (4.1.1) is not 
understood). The function f(x) is known as the probability density 
function (or simply density) of x. A value of this function is called the 
probability density of a particular value of z. It u not the probability 
of that value of x, but merely a function that defines a curve such that 
the area under the curve represents probability. For example, the 
uniformly shaded area in Fig. 4.1.3, as a proportion of the whole 
area under the curve, is the probability that a value of x will lie 
between two specified values, x, and x 2 . The summation of the in- 
finitesimal blocks of whioh this area is made up is handled mathe- 
matically by integration so this area can be written as the integrated 
form of (4.1.1), 

Pto [ H /(x)dx. (4.1.2) 

Similarly, the probability that a value of x greater than Xj will be 
observed is equal to the area above the point x a . How far along the 
x-axis the distribution curve extends depends on the particular distri- 
bution under consideration. The curve may reach the axis at some 
finite minimum or maximum value of x, implying that observations 
less or greater than this value are impossible ; or the curve may, like 
the Gaussian (or normal) distribution, be asymptotic to the x-axis so 
that any value of x is allowed, through the probability of observing 
values far removed from the mean soon becomes small. In the latter 
case the probability of observing a value of z equal to or less than x x 
(the area under the distribution curve below x t ) would be written 

P(x^ Xl ) = \ /(x)dx. (4.1.3) 

J- <a 

This area is said, in statistical jargon, to be the lou-er tail of the distri- 
bution. It can be called p, or F{x x ), and is vertically shaded in Fig. 
4.1.3. It depends, of course, on the value of x x ohosen, i.e. it is a funotion 
of at!. 



Copyrighted material 



68 



Theoretical distributions 



A more satisfactory way of writing the same thing is to use a special 
symbol, say x, to distinguish x considered as a random variable, from 
a particular value of the random variable, denoted simply x. The 
probability of observing a value of the variable (e.g. muscle tension) equal 
to or less than some specified value x (e.g. 2-0 g) as in (4.1.3), is written 
in this notation asf 



This is referred to as the distribution function of x, or as the cumulative 
distribution. The area below x in Fig. 4.1.3, F(x), is plotted against x 



Fio. 4.1.4. Distribution function, F{x), for the distribution shown in Fig. 
4.1.3. The probability of observing a value of x or less is plotted against x. The 
area between ^ and «a in Fig. 4.1.3 is Fi^-Fixt) m 0-988-0-894 = 0 094, the 
probability of an observation falling between x t and Xj. 

in Fig. 4.1.4. Examples of cumulative distributions occur in §§5.1 and 
14.2. The area, F{x), approaches 1-0 as x becomes very large, i.e. it is 
almost certain that the variable (e.g. muscle tension) will be less than 

t Another, mathematically better, way of writing exactly the same thing 



P(£ < x) = [* f(x)dx = F(x), or p. 



(4.1.4) 





The variable e does not appear in the Baal 



Copyrighted material 



§ 4.1 The, Gaussian and other continuous distributions 69 

a specified very large value (e.g. 100 kg). Differentiating (4.1.4) shows 
that the distribution function is related to the probability density, as 
suggested by (4.1.1), thus 

-3T-/W- 



4.2. The Gaussian, or normal, distribution. A com of wishful 
thinking? 

The assumption that the errors in real experimental observations 
can be described by the normal distribution (4.2.1) has dominated 
large areas of statistics for many years. The assumption is virtually 
always untested, and the extent to whioh it is mere wishful thinking 
will be discussed in this section, after the distribution has been defined, 
and in § 6.2, where the merits of methods not involving the assumption 
of Gaussian errors are considered. 



Definition of the distribution 

The Gaussian distribution, often, but inappropriately, known as 
the normal distribution, is denned by the probability density function 
(see §4.1) 

/{X) = ^7^j^-(*-/*) 9 /^ 9 ]. 

where ir has its usual value, and fi and a are constants. The factor 
\ja^{2m) is a constant suoh that the total area under the curve 
(from x— -co to x = + oo) is 1-0. The notation exp(x) is used to 
stand for e z when the exponent, x, is a long expression that would be 
inconvenient to write as a superscript. If f(x) is plotted against x the 
graph comes out as shown in Fig. 4.2.1. 

It is a symmetrical bell-shaped curve asymptotio to, i.e. never quite 
reaching, the x-axis. Being continuous it represents an infinite popula- 
tion (see § 4.1). The constant p is the population meant and also tne 
population median and mode because the distribution issymmetrioal and 
uni modal ; see §§2.5 and 4.5. The oonstant a measures the width} of 

f Thia ia proved in f A 1 . 1 . 

I The distenoe from p to the point of inflection (maximum elope) on each aide of 
the mean. Differentiating (4.2.1) twice with respect to x and equating to tero gives 
x - M± a- The population variance ia defined in 5 A 1.2. 



Copyrighted material 



70 Theoretical distributions 



§4.2 



the ourve as shown in Fig. 4.2.1, i.e. it is a measure of the scatter of the 
values of x, and is the population standard deviation of*. An estimate 
of a could be made from a sample of observations, taken from the 
population represented by Fig. 4.2.1, using (2.6.2). The distribution is 
completely defined by the two parameters p and a. 

1 8 the widespread use of the normal distribution justified ? 

'Everybody firmly believes in it [the normal distribution] because the mathe- 
maticians imagine that it is a fact of observation, and observers that it is a 
theory of mathematics' (quoted by Poincar6 1802). 

From the point of view of someone trying to interpret real observa- 
tions (and who else is statistics for ?) the only possible justification for 
the common assumption of normality would be the experimental 




Fig . 4.2.1. Gaussian ('normal') distribution. 4-6 per cent of the observations 
in the population are more than two population standard deviations from the 
mean (the shaded area is 4 6 per cent of the total area). The value 4-6 does not 
apply to samples or, in general, to distributions other than the Gaussian (see 

H4.4 and 4.6). 



demonstration that the methods based on this assumption give results 
that are correct, or at least sufficiently nearly correct for the purpose in 
hand. 

The truth is that no such demonstration exists. The many textbooks, 
elementary and not so elementary, describing methods that mostly 
depend on this assumption evade this awkward fact in a variety of 
ways. The more advanced books usually say something like 'If x were 
normally distributed then . . . would follow', which is true but not 
very helpful in real life. In more elementary books one often finds 
(to quote two) remarks such as 'It is not infrequently found that a 



Copyrighted material 



§ 4.2 The Gaussian and other continuous distributions 7 1 

population represented in this way [i.e. by a Gaussian curve] is suffi- 
ciently accurately specified for the purpose of the inquiry', or 'Many of 
the frequency functions applicable to observed distribution do have a 
normal form'. Such remarks are, at least as far as most laboratory 
investigations are concerned, just wishful thinking. Anyone with 
experience of doing experiments must know that it is rare for the 
distribution of the observations to be investigated. The number of 
observations from a single population needed to get an idea of the form 
of the distribution is quite large — a hundred or two at least — so this is 
not surprising. In the vast majority of cases the form of the distribution 
is simply not known: and, in an even more overwhelming majority of 
cases there is no substantial evidence regarding whether or not the 
Gaussian ourve, is a sufficiently good approximation for the purposes of 
the inquiry. It is simply not known how often the assumption of 
normality is seriously misleading. See § 4.6 for tests of normality. 

That most eminent amateur statistician, W. S. Gosset ('Student*, see 
§ 4.4), wrote, in a letter dated June 1929 to R. A. Fisher, the great 
mathematical statistician, . . although when you think about it you 
agree that "exactness" or even appropriate use depends on normality, 
in practice you don't consider the question at all when you apply your 
tables to your examples: not one word.' 

For these reasons some methods have been developed that do not 
rely on the assumption of normality. They are discussed in § 6.2. How- 
ever, many problems can still be tackled only by methods that involve 
the normality assumption, and when such a problem is encountered 
there is a strong temptation to forget that it is not known how nearly 
true the assumption is. A possible reason for using the Gaussian method 
in the absence of evidence one way or the other about the form of the 
distribution, is that an important use of statistical methods is to 
prevent the experimenter from making a fool of himself (see Chapters 
1, 6, and 7). It would be a rash experimenter who presented results that 
would not pass a Gaussian test, unless the distribution was definitely 
known to be not Gaussian. 

It is commonly said that if the distribution of a variable is not 
normal, the variable may be transformed to make the distribution 
normal (for example, by taking the logarithms of the observations, see 
§ 4.5). As pointed out above, there are hardly ever enough observations 
to find out whether the distribution is normal or not, so this approach 
can rarely be used. Transformations are discussed again in §§ 4.6, U.2 
(p. 176) and § 12.2 (p. 221). 



Copyrighted material 



72 Theoretical distributions 



§4.2 



Various other reasons are often given for using Gaussian methods. 
One is that some Gaussian methods have been shown to be fairly immune 
to some sorts of deviations from normality, if the samples are not too 
small. Many methods involve the estimation of means and there is an 
ingenious bit of mathematics known as the central limit theorem that 
states that the distribution of the means of samples of observations will 
tend more and more nearly to the Gaussian form as the sample size 
increases whatever (almost) the form of the distribution of the observa- 
tions themselves (even if it is skew or discontinuous). These remarks 
suggest that when one is dealing with reasonably large samples, 
Gaussian methods may be used as an approximation. The snag is that 
it is impossible to say, in any particular case, what is a 'reasonable* 
number of observations, or how approximate the approximation will 
be. 

Further discussion of the assumptions made in statistical calculations 
will be found particularly in §§ 6.2 and U.2. 



4.3. The standard normal distribution 

Applications of the normal distribution often involve finding the 
proportion of the total area under the normal curve that lies between 
particular values of the abscissa x. This area must be obtained by 
evaluating the integral (4.1.2), with the normal probability density 
function (4.2.1) substituted for f{x). The integral cannot be explicitly 
solved. The answer comes out as the sum of an infinite number of terms 
(obtained by expanding the exponential). In practice the only convenient 
method of obtaining areas is from tables. For example, the Biometrika 
Tables, Pearson and Hartley (1966, Table 1), give the area under the 
standard normal distribution (defined below) below u (or the area 
above — u which is the same), i.e. the area between -co and u (see 
below). In this table u and the area are denoted X and P{X) respectively. 
Fisher and Yates (1963, Table II lr p. 45) give the area above u (= area 
below — «), the value of u being denoted x in this table.f 

If tables had to be constructed for a wide enough range of values of 
u and a to be useful they would be very voluminous. Fortunately 
this is not necessary since it is found that the area lying within any 
given number of standard deviations on either side of the mean is the 

t Tables of Student's I (tee § 4.4) give, on the line for infinite degree* of freedom, the 
area below —u plus the are* above +u, i.e. the area in both tails of the distribution of u. 



Copyrighted material 



§ 4.3 Tfu Gaussian and other continuous distributions 73 

same whatever the values of the mean and standard deviation. For 
example it is found that: 

(1) 68-3 per cent of the area under the curve lies within one standard 
deviation on either side of the mean. That is, in the long run, 68-3 
per cent of random observations from a Gaussian population would 
be found to differ from the population mean by not more than one 
population standard deviation. 

(2) 95-4 per cent of the area lies within two standard deviations (or 
96-0 per cent within ±l-96<x). The 4-6 per cent of the area outside 
±2a is shaded in Fig. 4.2.1. 

(3) 99*7 per cent of the area lies within three standard deviations. 
Only 0-3 per oent of random observations from a Gaussian population 
are expeoted to differ from the mean by more than three standard 
deviations. 

It follows that all normal distributions can be reduced to a single 
distribution if the abscissa is measured in terms of deviations from the 
mean, expressed as a number of population standard deviations. In 
other words, instead of considering the distribution of x itself it is 
simpler to consider the distribution of 

x—u 

*=-f' (4-3.1) 

The distribution of u is called the standard normal distribution. 
It is still a normal distribution because u is linearly related to the 
normally distributed x (ji and o being constants for any particular 
distribution), but it necessarilyt always has a mean of zero and a 
standard deviation of 1*0. The numerator, x—u, is a normally distri- 
buted variable with a population mean of zero (because the long run 
average value of x is u) and variance tr 2 . To illustrate this consider a 
normally distributed variable x with population mean u = 6 and 
population standard deviation o = 3. It can be seen from Fig. 4.3. 1 that 
the distribution of {x—u), i.e. of (x— 6), has a mean of zero but a 
standard deviation unchanged at 3 (of. (2.7.7)), and that when this 
quantity is divided by o the standard normal distribution (mean = 0 
standard deviation = 1) results. 

t Sm f S Al.l and A1.2. The steadwd form of a distribution it defined in { A1.2. 



Copyrighted material 



74 Theoretical distributions § 4.3 

In terms of the standard normal distribution the areas (obtainable 
from the tables referred to above) become 

(1) 68-3 per cent of the area lies between u = — 1 and u = -f 1 
(and thus 15-85 per cent lies below —1, and 15-86 per cent above 

+ 1), 




FlQ. 4.3.1. Relation of normal distribution to standard normal distribution, 
(a) x is normally distributed with population mean ft = 6 and population standard 
deviation a = 3. (b) (ar — ft) is normally distributed with population mean = 0 

cc— 6 

and population standard deviation = 3. (c) u = (ar — ft)(a — — ^— In this case 

is normally distributed with population mean ■= 0 and population standard 

deviation = I. 



Copyrighted material 



§ 4.3 The Gaussian and other continuous distributions 75 

(2) 05 per cent of the area lies between u = — 1*96 and u = +1-9G, 

(3) 99-7 per cent of the area lies between u = — 3 and u = +3. 

In order to convert an observation x into a value of u — (x—p)lo, it 
is, of course, necessary to know the values of p and a. In real life the 
values of p and a will not generally be known, only more or lees accurate 
estimates of them, viz. 2 and a, will be available. If the normal distribu- 
tion is to be used for induction as well as deduction this fact must be 
allowed for, and the method of doing this is discussed in the next 
section. 

4.4. The distribution of t (Student's distribution) 

The variable t is denned as 




(4.4.1) 



where x is any normally distributed variable and s(x) is an estimate 
of the standard deviation of x from a sample of observations of x 
(see § 2.6). Tables of its distribution are referred to at the end of the 
section. It is the same as u denned in (4.3.1), except that the deviation 
of a value of x from the true mean (/i) is expressed in units of the 
estimated or sample standard deviation of x, s(x) (eqn (2.6.2)), rather 
than the population standard deviation o(x). As in § 4.3 the numerator, 
(x— n), is a normally distributed variable with population mean zero 
(because the long run average value of x is /i, see Appendix, eqn 
(Al.1.8), and estimated standard deviation s(x). 

The 'distribution of *' means, as usual, a formula, too complicated 
to derive here, for calculating the frequency with whioh the value of t 
would be expected to fall between any specified limits; see example 
below. The distribution of t was found by W. S. Gosset, who wrote 
many papers on statistical subjects under the pseudonym 'Student', in 
a classical paper called 'The probable error of a mean' which was 
published in 1908. 

Gosset was not a professional mathematician. After studying 
chemistry and mathematics he went to work in 1899 as a brewer at the 
Guinness brewery in Dublin, and he worked for this firm for the rest of 
his life. His interest in statistics had strong practical motives. The 
majority of statistical work being done at the beginning of the century 
involved large samples and the drawing of conclusions from small 
samples was regarded as a very dubious process. Gosset realized that 



Copyrighted material 



76 Theoretical distributions 



§4.4 



the methods used for dealing with large samples would need modification 
if the results were to be applicable to the small samples he had to 
workrin the laboratory. 

Gosaet spent a year (1906-7) away from the brewery, mostly working 
in the Biometrio Laboratory of University College London with 
Karl Pearson, and in 1908 published a paper on the distribution of t. 

As an example, suppose that the normally distributed variable of 
interest is x, the mean of a sample of 4 observations selected randomly 
from a population of normally distributed values of x with population 
mean u and population standard deviation a(x). The population 
standard deviation of x (or 'standard error', see § 2.7) will be a(x) 




FlO. 4.4.1. Continuous line. Distribution of Student's t with S degrees of 
freedom. Ninety-five per cent of values lie between — 3'182 and + 3-182 (see text 
for example). The 6 per cent of the area outside these values is shaded. Broken 
line. Standard Gaussian (normal) distribution. 95 per cent of u values lie 
between u = —1-96 and u = -f 1-96. The 5 per cent of the area outside these 
values is shaded vertically. As the sample size (degrees of freedom) become very 
large, the t distribution becomes identical with the standard normal distribution. 



= <*(aO/V4 (by (2.7.9)) and the population mean of x will be /i, the 
same as for x. (See Appendix 1, (A 1.2. 3).) Therefore if a very large 
number of samples of 4 were taken, and if for each u — (x—fi)fa{x) 
(from the definition (4.3.1)) were calculated, it would be found that in 
the long run 95 per cent of the values of u would lie between u — —1-96 
and u = +1*96, as disoussed in § 4.3 and illustrated in Fig. 4.4.1. 



§4.4 



The Gaussian and other continuous distributions 



77 



However, if <r(x) were not known, an estimate of it, s{x), oould be 
calculated from each sample of 4 observations using (2.6.2) as in the 
example in $ 2.7, and from eaoh sample s{x) — a{x)f\/4 obtained by 
(2.7.9). For each sample t — {x—p)la(z) (from the definition (4.4.1)) 
oould be now calculated. The values of x would be the same as those 
used for calculating u, but the value of s(x) would differ from sample 
to sample, whereas the same population value, o{x), would be used in 
calculating every value of u). The extra variability introduced by 
variability of s(x) from sample to sample means that t varies over a 
wider range than u, and it can be found from the tables referred to 
below, that it would be expeoted that, in the long run, 95 per cent of 
the values oft would lie between —3- 182 and -+-31 82, as illustrated in 
Fig. 4.4.1. 

Notice that both the distributions in Fig. 4.4.1 are based on observa- 
tions from the normal distribution with population standard deviation 
a. The distribution of t, unlike that of «, is not normal, though it is 
based on the assumption that x is normally distributed. 

Although the definition of t (4.4.1) takes account of the uncertainty 
of the estimate of o{x), it still involves knowledge of the true mean /* 
and it might be thought at first that this is a big disadvantage. It will 
be found when tests of significance and confidence limits are discussed 
that, on the contrary, everything necessary can be done by giving a a 
hypothetical value. 

The use of tables of the distribution oft 

The extent to which the distribution of t differs from that of u will 
dearly depend on the size of the sample used to estimate s{x). The 
appropriate measure of sample size, as discussed in § 2.6, is the number 
of degrees of freedom associated with s{x). If s{x) is calculated from a 
sample of N observations the number of degrees of freedom associated 
with s(x) is N-\ as in § 2.6. Clearly, t with an infinite number of 
degrees of freedom is the same as u, because in this case the estimate 
s(x) is very accurate and becomes the same as o(x). 

Fisher and Yates (1963, Table 3, p. 46, 'The distribution of t') denote 
the number of degrees of freedom n and tabulate values such that t 
has the specified probability of falling above the tabulated value or 
below minus the tabulated value. Looking in the table for n = 4—1 
= 3 and P = 0-05 gives t = 3182 as discussed in the example above, 
and illustrated in Fig. 4.4.1 in which the 5 per cent of the area outside 
<= ±3182 is shaded. 



Copyrighted material 



78 Theoretical distributions § 4.4 

The Biometrika Tables of Pearson and Hartley (1966, Table 12, 
p. 146, 'Percentage points of the t distribution') give the same sort of 
table. The number of degrees of freedom is denoted v and the probability 
2Q t Q being the shaded area in one tail of Pig. 4.4.1. 

4.6. Skew distributions and the lognormaJ distribution 

In § 4.2 it was stressed that the normal distribution is a mathe- 
matical convenience that cannot be supposed to represent real life 
adequately, and that it is very rare in experimental work for the 
distribution of observation to be known. In those cases where the 
distribution has been investigated it has often been found to be non- 
normal. Distributions may be more flat-topped or more sharp-topped 
than the normal distribution, and they may be unsymmetrical . Un- 
aym metrical distributions may have positive skew as in Fig. 4.5.1 
(an even more extreme case is the exponential distribution Fig. 5.1.2), 
or negative skew, as in the mirror image of Fig. 4.5.1. 

mode 




x 



Fio. 4.6.1. The lognormal distribution; a positively skewed probability 
distribution. The mean value of x is greater than the median, and the mode is 
less than the median. The 50 per cent of the area that Ues (by definition) below 
the median is shaded. For the lognormal distribution, in general, mode = antilogy 
(yu -2-y026<r J ) (= 6-81 in this example), median = antilog 10 /i (= 10 0 in this 
example), mean = antilog 10 (^ + M5130 3 ) (=13-1 in this example), where 
p and a 3 are mean and variance of the (normal) distribution of the log 10 x shown in 
Fig. 4.5.2. Reproduced from Docwnenta Geigy scientific tables, 6th edn, by per- 
mission of J. R. Geigy S.A., Baale, Switzerland. 

In the case of symmetrical distributions (such as the normal) the 
population mean, median, and mode (see § 2.5) are all the same, but this 
is not so for unsymmetrical distributions. For example, when the 
distribution of x has a positive skew, as in Fig. 4.5.1, the population 



Copyrighted material 



§ 4.5 The Gaussian and other continuous distributions 79 

mean is greater than the population median which ia in turn larger 
than the population mode. There is no particular reason to prefer the 
mean to the median or mode as a measure of the 'average' value of the 
variable in a case like this. A reason for preferring the median is men- 
tioned below (see also Chapter 14). The distribution of personal incomes 
has a positive skew so the most frequent income (the mode) is less than 
the mean income, and more people earn less than the mean income than 
earn more than the mean income, because incomes above the mean are, 



mode, median, mean 




Fio. 4.6.2. The distribution of log 10 x, when z is lognormally distributed as 
shown in Fig. 4.6.1. This distribution is normal (by definition of the lognonnal 
distribution). In this example the mean (= median = mode) of log 10 x ia 
f* — 10, and the standard deviation of \og i0 x la a - 0*32. See text and Chapter 
14. Reproduced from Documenta Oeigy scientific table*, 6th edn by permission of 

J. R. Oeigy S.A., Basle, Switzerland. 

on the whole, further from the mean than incomes below it, i.e. more 
than 50 per cent of the area under the curve is below the mean, as 
shown by the shading in Fig. 4.5.1. 

It is usually recommended that non-normal distributions be con- 
verted to normal distributions by transforming the scale of x (see 
§§4.2, 11.2, and 12.2). This should be done when possible, but in 
most experimental investigations there is not enough information to 
allow the correct transformation to be ascertained. In Chapter 14 
an example is given of a variable (individual effective dose of drug) 
with a positively skewed distribution (Fig. 14.2.1). In this particular 
example the logarithm of the variable is found to be approximately 
normally distributed (Fig. 14.2.3). In general, a variable is said to 
follow the lognormal distribution, which looks like Fig. 4.5.1, if the 
logarithm of the variable is normally distributed, as in Fig. 4.6.2. 



Copyrighted material 



80 Theoretical distribution 



§4.5 



In Chapter 14 the median value of the variable (rather than the 
mean) is estimated. The median is unchanged by transformation, i.e. 
the population median of the (lognormal) distribution of a; is the 
antilog of the population median (= mean = mode) of the (normal) 
distribution of log x, whereas the population mode of x is smaller, and 
the population mean of x greater than this quantity (of. (2.5.4)). For 
example, in Fig. 4.5.2 the median = mean = mode of the distribution 
of log 10 x is 1-0, and the median of the distribution of a; in Fig. 4.5.1 
is antilog 10 1 = 10, whereas the mode is less than 10 and the mean larger 
than 10. 

Because of the rarity of knowledge about the distribution of observa- 
tions in real life these theoretical distributions will not be discussed 
further here, but they occur often in theoretical work and good aocounts 
of them will be found in Bliss (1967, Chapters 6-7) and Kendall and 
Stuart (1963, p. 168; 1966, p. 93). 

4.6. Testing for Gaussian distribution. Ran kits and pro bits 

If there are enough observations to be plotted as a histogram, like 
Figs. 14.2.1 and 14.2.2, the probit plot described in §§ 14.2 and 14.3 
can be used to test whether variables (e.g. z and log z in § 14.2) follow 
. the normal distribution. For smaller samples the rankit method 
deeoribed, for example, by Bliss (1967, pp. 108, 232, 337) can be used. 
For two way classifications and Latin squares (see Chapter 11) there is 
no practicable test. It must be remembered that a small sample gives 
very little information about the distribution; but consistent non- 
linearity of the rankit plot over a series of samples would suggest a 
non-normal distribution. The N observations are ranked in ascending 
order, the rankit corresponding to eaoh rank looked up in tables (see 
Appendix, Table A9). Each observation (or any transformation of the 
observation that is to be tested for normality) is then plotted against 
its rankit. 

The rankit corresponding to the smallest (next to smallest, etc.) observation, 
la defined as the long run average (expectation, see § Al.l) of the smallest (next 
to smallest, etc.) value in a random sample of N standard normal deviates (values 
of v, see S 4.S). Thus, if the observations (or their transformations) are normally 
distributed, the observations and rankits should differ only in scale, and by 
random sampling, so the plot should, on average, be straight. 



Copyrighted material 



5 . Random processes. The exponential 
distribution and the waiting time 
paradox 



5.1. The exponential distribution of random intervals 

Dynamic processes involving probability theory such as queues, 
Brownian motion, and birth and deaths are called stochastic processes. 
This subject is discussed further in Appendix 2. An example of interest 
in physiology is the apparently random occurrence of miniature 
post-junotional potentials at many synaptic junctions (reviewed by 
Martin (1966)). It has been found that when the observed number (n) 
of the time intervals between events (miniature end-plate potentials), 
of duration equal to or less that t seconds is plotted against t, the 
curve has the form shown in Fig. 5.1.1. Similar results would be 
obtained with the intervals between radiosotope disintegrations ; see 
§ A2.5. 

The observations are found to be fitted by an exponential curve, 

n = N(l-e- t,T ), (5.1.1) 

where N — total number of intervals observed and T = mean duration 
of all N intervals (an estimate of the population mean interval, J~). 

If the events were occurring randomly it would be expected that the 
number of events in unit time would follow the Poisson distribution, as 
described in § § 3.5 and A2.2. How would the intervals between events 
be expected to vary if this were so ? 

The true mean number of events in t seconds (called m in § 3*5) is 
tjJ~, which may be written as U, where X = \\f is the true mean 
number of events in 1 second. Thus 3~ = 1/A is the mean number of 
seconds per event, i.e. the mean interval between events (see (Al.1.11)). 
According to the Poisson distribution (3.5.1) the probability that no 
event (r = 0) occurs within time t from any specified starting point, 
i.e. the probability that the interval before the first event is greater 
than t, is P(0) = B~ m = e' u . The first event must occur either at a 



Copyrighted material 



82 Random processes 



§5.1 



time greater than, or at a time equal to or less than t. Because these 
cannot both happen it follows from the addition rule, (2.4.3), that 

P[interval><]+ ^interval < t] = 1 

and thus 

P[interval <*] = F(t) = 1-e"" (for t ^ 0). (6.1.2) 
(The distribution function, F, was denned in (4.1.4). 




Duration of interval (as a multiple of the mean interval) A / 



Fio. 6.1.1. The cumulative exponential distribution (eqn. (5.1.2)). The 
intervals between random events are observed to fall on a curve like this. The 
abscissa is the interval duration expressed as a multiple of the population mean 
interval, i.e. it is = U. For example, if the population mean were 3~ = 10 s 
(i.e. A =■ (Ms 1 ), the graph shows that 63*21 per cent of intervals would be 
10 s or shorter, and that 60 per cent of intervals (by definition of the median) 
would be equal to or leas than 0 03 s, the population median. 

Multiplying this probability by N predicts the number of intervals 
shorter than t as 2V(1— e""), as observed (see (5.1.1)). 

This implies that the exponential distribution is the distribution of 
the interval between any specified point of time and the point at 



Copyrighted material 



§5.1 Random processes 

which the next event occurs. And, in particular, it ia the distribution of 
the time interval between successive events (see § 5.2 and Appendix 2). 
Because the intervals can be of any length this is a continuous distribu- 
tion, unlike the Poisson, and it has probability density (see § 4.1), using 
(5.1.2) and (4.1.5), 



&F{t) d 

fit) - = S (1 " e > = * (for 1 > 0) ' 

= 0 (for t < 0). 



(5.1.3) 



l-o, 




Population median 

Population mean 



00 



0 693 1 0 20 30 

Xt, duration of interval (as a multiple of the mean interval) 



40 



Fio. 5.1.2. The exponential distribution (an extreme case of the positive 
skew illustrated in Fig. 4.6.1). Fifty per cent of the area under the curve lies 
below the median. The area up to t is plotted against t in Fig. 6.1.1. The abscissa, 
is plotted in the same way as in Fig. 6.1.1. (If the abscissa is multiplied by :T 
=s X' 1 to convert it to time units, the probability density would be divided by T, 
so the area under the curve remained 10.) 



This exponential distribution of the lengths of random intervals is 
plotted in Fig. 5.1.2. It is an extreme form of positively skewed distri- 
bution (see § 4.6), the mode being zero, the mean 1/A = 9~ , and the 
median 0-693^ (this is proved in Appendix 1, (Al. 1.11) and (Al. 1.14)). 



Copyrighted material 



84 Random processes §5.1 

Pig. 6.1.1 is the cumulative form, F{t) (see (4.1.4)), of the exponential 
distribution (of. Fig. 4.1.4, whioh is the cumulative form of the normal 
distribution in Fig. 4.1.3). To obtain Fig. 5.1.1 from Fig. 6.1.2 notioe 
that the probability of observing an interval < I is given by the area 
under the distribution ourve (Fig. 6.1.2) below t, i.e. between 0 and t 
(see § 4.1). This, using (4.1.4), is 

P[0 < interval < t] = F(t) = f'^e-^dl = 1-e"", (5.1.4) 

whioh is (5.1.2) again. Further disoussion will be found in Appendix 2. 

A more complete disoussion of the Poisson process would require 
consideration of the distribution of the sum of n intervals. When this 
is done it is seen that the observation of an exponential distribution 
does not necessarily imply a Poisson distribution of events in unit time 
unless the intervals are independent of each other. Independence has 
been checked experimentally by Burnstock and Holman (1962). This 
independence is one of the denning properties of the Poisson process 
(see § 3.5 and Appendix 2). 

5.2. The waiting time paradox 

It was implied in § 5.1 that, for completely random events, the 
average length of time from a randomly selected arbitrary point of 
time (midday, for example) until the next event is the same (viz. f) 
as the average length of the interval between two events (both intervals 
have the same exponential distribution). This is proved in § A2.6. 
(An arbitrary point, in this context, means a point of time chosen by 
any method that is independent of the occurrence of events.) It must 
be so since the events in non-overlapping time intervals are supposed 
independent, i.e. the process has no 'memory* of what has gone beforef. 
Yet it seems 'obvious* that, since the arbitrarily selected time is equally 
likely to fall anywhere in the interval between two events, the average 
waiting time from the selected time to the next event must be \T. 

For example, if buses were to arrive at a bus stop at random intervals, 
with a mean interval of 9~ = 10 min, then a person arriving at the 
bus atop at an arbitrary time might be supposed, on the average, to 
have to wait 5 min for the next bus . % In fact, the true average waiting 
time would be 10 min. 

t 8ee §§ 3.5, A2.1 and A2.2 for details. 

j 5 min would be the right anawer if the bueee arrived regularly not randomly, ao 
that aU interval* were exactly 10 min. 



Copyrighted material 



$5.2 



Random processes 85 



The subtle flaw in the argument for a waiting time of \T lies in the 
implicit assumption that the interval in whioh an arbitrarily selected 
time falls is a random selection from all intervals. In fact, longer 
intervals have a better ohanoe of covering the selected time than 
shorter ones, and it can be shown that the average length of the interval 
in which an arbitrarily selected time falls is not the same as the average 
length of ail intervals, T, but is actually IT (see $ A2.7). Sinoe the 
selected time may fall anywhere in this interval, the average waiting 
time is half of 2T, i.e. it is T, the average length of all intervals, as 
originally supposed. The paradox is resolved. In the bus example this 
means that a person arriving at the bus stop at an arbitrary time would, 
on average, arrive in a 20-min interval. On average, the previous bus 
would have passed 10 min before his arrival (as long as this was not 
too near the time when buses started running) and, on average, it 
would be another 10 min until the next bus. 

These assertions, whioh surprise most people at first, are discussed 
(with examples of biological importance), and proved, in Appendix 2. 



Copyrighted material 



6. Can your results be believed? 

Tests of significance and the analysis 
of variance 



4 . . . before anything was known of Lydgate's skill, the judgement* on it had 
naturally been divided, depending on a sense of likelihood, situated perhaps in 
the pit of the stomach, or in the pineal gland, and differing in it* verdict*, but not 
less valuable as a guide in the total deficit of evidence.' 

George Eliot 
(MiddUmarch, Chap. 46) 



6.1 . The interpretation of tests of significance 

This has already been discussed in Chapter 1. It was pointed out 
that the function of significance tests is to prevent you from making a 
fool of yourself, and not to make unpublishable results publishable. 
Some rather more technical points can now be discussed. 

( 1 ) Aids to judgement 

Tests of significance are only aids to judgement. The responsibility 
for interpreting the results and making decisions always lies with the 
experimenter whatever statistical calculations have been done. 

The result of a test of significance is always a probability and should 
always be given as such, along with enough information for the reader 
to understand what method was used to obtain the result. Terms such 
as 'significant' and 'very significant* should never be used. If the reader 
is unlikely to understand the result of a significance test then either 
explain it fully or omit reference to it altogether. 

(2) Assumptions 

Assumptions about, for example, the distribution of errors, must 
always be made before a significance test can be done. Sometimes 
some of the assumptions are tested but usually none of them are 
(see § § 4.2 and 1 1.2). This means that the uncertainty indicated by the 
test can be taken as only a minimum value (see §§ 1.1 and 7.2). The 
assumptions of tests involving the Gaussian (normal) distribution are 
discussed in §§11.2 and 12.2. Other assumptions are discussed when 
the methods are described. 



Copyrighted material 



§ 8.1 Tests of significance and the analysis of variance 87 

Some testa (nonparameiric tests), which make fewer assumptions than 
those based on a specified, for example normal, distribution {parametric 
tests such as the t test and analysis of variance), are described in the 
following sections. Their relative merits are discussed in § 6.2. Note, 
however, that whatever test is used, it remains true that if the test 
indicates that there is no evidence that, for example, an experimental 
group differs from a control group then the experimenter cannot 
reasonably suppose, on the basis of the experiment, that a real difference 
exists. 

(3) The basis and the results of tests 

No statements of inverse probability (see § 1.3) are, or at any rate 
need be, made as a result of significance tests. The result, P, is always 
the probability that certain observations would be made given a 
particular hypothesis, i.e. if that hypothesis were true. It is not the 
probability that a particular hypothesis is true given the observations. 

It is often convenient to start from the hypothesis that the effect 
for which one is looking does not exist, f This is called a null hypothesis. 
For example, if one wanted to compare two means (e.g. the mean 
response of a group of patients to drug A with the mean response of 
another group, randomly selected from the same population, to drug B) 
the variable of interest would be the difference between the two means. 
The null hypothesis would be that the true value of the difference was 
zero. The amount of scatter that would be expected in the difference 
between means if the experiment were repeated many times can be 
predicted from the experimental observations (see § 2.7 for a full 
discussion of this process), and a distribution constructed with this 
amount of scatter and with the hypothetical mean value of zero, as 
illustrated in Fig. 6.1.1. From this it can be predicted what would 
happen if the null hypothesis that the true difference is zero were true. 
In practice it will be necessary to allow for the inexactness of the 
experimental estimate of error by considering, for example, the 
distribution of Student's (, see §§ 4.4 and 9.4, rather than the distribu- 
tion of the difference between means itself. If the differences are 
supposed to have a continuous distribution, as in Fig. 6.1.1, it is clearly 
not possible to calculate the probability of seeing exactly the observed 
difference (see § 4.1); but it is possible to calculate the probability of 
seeing a difference equal to or larger than the observed value. In the 
example illustrated this is P = 0-04 (the vertically shaded area) and 

t See p. 93 for » more oritical diacuasion. 



Copyrighted material 



88 Tests of significance and the analysis of variance 



6.1 



this figure is described as the result of a one-tail significance test. Its 
interpretation is discussed in (4) below. It is the figure that would be 
used to test the null hypothesis against the alternative hypothesis that 
the true differenoe is positive. When the alternative hypothesis is that 
true differenoe is positive, the result of a one-tail test for the 
difference between two means always has the following form. 



// there were no difference between the true (population) means 
then the probability of observing, because of random sampling 
error, a difference between sample means equal to or greater 
than that observed in the experiment would be P (assuming the 
assumptions made in carrying out the test to be true). 



Amount of 
scatter inferred 
from experiment 




4 per cent of area 
tail P) 



Flo. 6.1.1. 



-Negative 

differences * differences 

Hypothetical Observed 
Imputation difference 
difference 



of significance tests. See text for 



V r aiue of difference 
between means 



If the only possible alternative to the null hypothesis is that the 
true difference is negative, then the interpretation is the same, except 
that it is the probability (on the null hypothesis) of a difference being 
equal to or less than the observed one that is of interest. 

In practice, in research problems at least, the alternative to the null 
hypothesis is usually not that the true differenoe is positive (or that it is 
negative) but simply that it differs from zerof (in either direction), 
because it is usually not reasonable to say in advanoe that only positive 
(or negative) differences are possible (or that only positive differences 
are of interest so the test is not required to detect negative differences). 

f Sssalsop. 9S. 



Copyrighted material 



§ 6.1 Tests of significance and the analysis of variance 89 

If the alternative to the null hypothesis is the hypothesis that the true 
difference between means is, say, positive, this implies that however 
large a negative differenoe was observed it would be attributed to 
chance rather than a true (population) negative differenoe (or at least 
that it would be considered of no interest if real). 

Suppose now that it cannot be specified beforehand whether the true 
difference between means is positive, zero, or negative. In the example 
above there would be probability of 0-04 of seeing a difference at least 
as large as the positive difference observed in the experiment if the null 
hypothesis were true. But there would also be a probability of 0 04 (the 
horizontally shaded area) of seeing a deviation from the null hypothesis 
at least as extreme as that actually observed but in the opposite direc- 
tion. The total probability of observing a deviation from the null 
hypothesis (in either direction) at least as extreme as that actually 
observed would be P = 0 04+0 04 = 0 08 if the null hypothesis were 
true. This is the appropriate probability because, if it were resolved 
to reject the null hypothesis as false every time an experiment gave a 
differenoe between means as large as, or larger than that observed in 
this experiment, then, •/ the null hypothesis were actually true it 
would be rejected (wrongly) not in 4 per cent of repeated experiments, 
but in 8 per cent. This is because negative observed differences in the 
lower tail of Fig. 6.1.1, which would also lead to wrong rejection of the 
null hypothesis, would be just as common, in the long run, as positive 
differences. The probability is chosen so as to control the frequency of 
this sort of error. This is discussed in more detail in subsection (6) 
below. 

The value P = 0 08 is described as the result of a two-tail test of 
significance. Its interpretation is discussed in subsection (4) below. The 
value of P is usually f twioe that for a one-tail test. The result of a 
two-tail test always has the following form. 



// the null hypothesis were actually true then the probability of a 
sample showing a deviation from it, in either direction, as 
extreme,! or more extreme, than that observed in the experiment 
would be P (Assuming the assumptions made in carrying out the 
test to be true). 



| In the esse of the normal distribution (i 4.2), or any other distribution that is 
symmetrical, whether continuous or discontinuous, for example the binomial distribution 
with & - 0-5 (§§ 3 2 and 3-4) or Student's distribution, (| 4.4), one oould say here ' . . . s 
deviation from it, in either direction, as lara* as, or larotr than, that observed in the 



Copyrighted material 



90 Tests of significance and the analysis of variance 



§6.1 



Notice that P is not tho probability that the null hypothesis is true 
but the probability that oertain observations would be made if it were. 

Perhaps the best popular interpretation of P is that it is the 'proba- 
bility of the results occurring by chance'. Although this is inaccurate 
and vague, and should therefore be avoided, it is not too misleading. 

(4) Interpretation of the results 
If P is very small the conclusion drawn is that either 

(a) an unlikely event has taken place, the null hypothesis being 
true. As Fisher (1961) said: 1 ... no isolated experiment, how- 
ever significant in itself, can suffice for the experimental demon- 
stration of any natural phenomenon; for the "one chance in a 
million" will undoubtedly occur, with no less and no more than 
its appropriate frequency, however surprised we may be that it 
should occur to us,' or 

(b) the assumptions on which the test was based were faulty, for 
example the samples were not drawn randomly, or 

(c) the null hypothesis is not true, for example the true (population) 
means in the above example are different, so that the drugs do 
in fact differ in their effects on patients (see also subsection (7), 
below). 

Whether (b) can be ruled out, and what level of improbability 
is enough to make one favour explanation (c) rather than (a), are 



experiment . . . * In general, this aim pier statement is not possible, however. Two 
other oases must be considered. (1) The sampling distribution (e.g. Fig. 6.1.1) ia con- 
tinuous but unsymmetrical (see § 4.5). In this case different sized positive and negative 
deviations will be needed to out off equal areas in the upper and lower tails (respectively) 
of the distribution. It is the extremeness (i.e. rarity) of the deviation measured by the 
area it cut* off in the tail of the distribution (rather than its size) that matters. The 
two-tail probability is still twice the one-tail probability, however. (2) The sampling 
distribution is both unsymmetrical and discontinuous (as often happens in the very 
important sort of tests known as randomization tests, see §§8.2, 9.2, 9.3, and 10.2- 
10.4). A greater difficulty arises in this case because the most extreme observations in the 
opposite tail of the distribution (that not containing the observation) will not generally 
out off an area exactly the same as that cut off by the observation in its own tail so P 
for the two-tail test cannot be exactly twice that for the one-tail test. There is no 
definite rule about what to do in this case. Most commonly a deviation is chosen in the 
opposite direction to that observed that cuts cuts off an area in the opposite tail not 
greater than the value found in the one-tail test, so the two -tail P is not greater than 
twioe the one-tail P. However, it may be decided to choose a deviation that outs off 
an area in the opposite tail that is as near as passible to that of the one-tail test. This 
is exemplified at the end of § 8.2 where the deviations of a from the null hypothetical 
value are stated, to show exactly what has been done. With small unequal samples 
the most extreme possible observation in the opposite tail may out off an area far greater 
than that in the one tail test. This problem is discussed in § 8.2. 



Copyrighted material 



§6.1 



Tests of significance and the analysis of variance 9 1 



entirely matters for personal judgement. The calculations throw no 
light whatsoever on these problems. It is often found in the biomedical 
literature that P — 0-05 is taken as evidence for a 'significant differ- 
ence'. However 1 in 20 is not a level of odds at which most people would 
want to stake their reputations as an experimenters and, if there is no 
other evidence, it would be wiser to demand a much smaller value 
before choosing explanation (c). 

A twofold change in the value of P given by a test should make 
little difference to the inference made in practice. For example, 
P = 0-03 and P = 0*06 mean much the same sort of thing, although 
one is below and the other above the conventional 'significance level' 
of 0-05. They both suggest that the null hypothesis may not be true 
without being small enough for this conclusion to be reached with any 
great confidence. 

In any case, as mentioned above, no single test is ever enough. 
To quote Fishor (1951) again: 'In relation to the test of significance, we 
may say that a phenomenon is experimentally demonstrable when we 
know how to conduct an experiment which will rarely fail to give us a 
statistically significant result'. 

(5) Generalization of the result 

Whatever the interpretation of the statistical calculations it is 
tempting to generalize the conclusion from the experimental sample to 
other samples (e.g. to other patients) ; in fact this is usually the purpose 
of the experiment. To do this it is necessary to assume that the new 
samples are drawn randomly from the same population as that from 
which the experimental samples were drawn. However, because of 
differences of, for example, time or place this must usually remain an 
untested assumption which will introduce an unknown amount of 
bias into the generalization (see §§ 1.1 and 2.3). 

(6) Types of error and the power of tests 

If the null hypothesis is not rejected on the basis of the experimental 
results (see subsection (7), below) this does not mean that it can be 
accepted. It is only possible to say that the difference between two 
means is not demonstrable, or that a biological assay is not demonstrably 
invalid. The converse, that the means are identical or that the assay is 
valid, can never be shown. If it could it would always be possible to find 
that there was, for example, 'no difference between two means' but 
doing such a bad experiment that even a large real difference was not 



Copyrighted material 



92 Tests of significance and the analysis of variance § 6. 1 

apparent. Although this may seem gloomy, it is only common sense. 
To show that two population means are identical exactly, the whole 
population, usually infinite, is obviously needed. 

An example. The supposition that a large P value constitutes evidence 
in favour of the null hypothesis is, perhaps, one of the most frequent 
abuses of 'significance' teste. A nice example appears in a paper just 
received. The essence of it is as follows. Differences between membrane 
potentials before and after applying three drugs were measured. 
The mean differences (d) are shown in Table 6.1.1. 

Table 6.1.1 

d standii for the difference between the membrane potentials (millivolts) in the 
presence and absence of the specified drug. The mean of n such differences 
is 3, and the observed standard deviation of d is «(d). The standard deviation 
of the mean difference is «(3) = 8(d)jy/n and values of Student's t are calculated 

as in § 10.6. 





3 


•(d) 


n 




t 


P (approx) 


Noradrenaline 


2-7 


101 


40 


100 


1-7 


01 


Adrenaline 


3-4 


12-2 


80 


1-36 


2-5 


<002 




3-9 


10-8 


60 


1-39 


2-8 


<001 



The potentials were about 90 mV bo the percentage change is small, 
but by doing many (n = 40-80) pairs of measurements, evidence was 
found against the null hypothesis that adrenaline has no effect, using 
the paired t test (see § 10.6). Similarly it was inferred that isoprenaline 
increases membrane potential. These inferences are reasonable, though 
the order in which treatments were applied was not randomized. In 
contrast, the P value for noradrenaline was 0- 1 and the authors there- 
fore inferred that 'noradrenaline had no effect on membrane potential', 
i.e. that the null hypothesis was true. This is completely unjustified. 
The apparent effect of noradrenaline, 2-7 mV, was not much smaller 
than that for other drugs, and, although the significance test shows 
that we cannot be sure that repeating the measurements would give 
a similar result, it certainly does not show that we would not get 
similar results. Suppose, perfectly plausibly, that 80 experiments had 
been done with noradrenaline (as with adrenaline) instead of 40. And 
suppose the mean difference was 2-7 mV and the standard deviation of 
the differences was 101. In this case t = 2-7/(10-1/^80) = 2-4 giving 
P < 0 02 a 'significant' result. The size of the difference d = 2-7 mV, 
and the scatter of the observations s(d) = 101, is just the same as in 



Copyrighted material 



§ 6. 1 Tests of significance and the analysis of variance 93 

Table 6.1.1, but despite this the authors would presumably have oome 
to the opposite conclusion. This is clearly absurd. But if the original 
experiment with n = 40 differences had been interpreted as 'no evidenoe 
for a real effect of noradrenaline* or 'effect, if any, masked by experi- 
mental error' there would have been no trouble. It is reasonable that 
the larger experiment should be capable of detecting differences that 
escape detection in the smaller experiments. 

These ideas can be formalized by considering the power of a signi- 
ficance test which is denned as the probability that the test will reject the 
null hypothesis (e.g. that two population means are equal), this proba- 
bility being considered as a function of the true difference between the 
means. For example, if the null hypothesis was always rejected when- 
ever a test gave P < 0-Ofi then, if the null hypothesis really were true it 
would be rejected (wrongly) in 5 per cent of trials, as explained in 
subsection (3) above (see subsection (7), below). The wrong rejection 
of a correct hypothesis is called an error of the first kind, and, in this 
case, the probability (a) of an error of the first kind would be a = 0 05. 
If in fact there was a difference between true population means, 
and this real difference was, for example, equal in size to the true 
standard deviation of the difference between means (see §§2.7 and 
9*4) (i.e. the difference, although real, is similar in size to the experi- 
mental errors), then it can be shown that a two- tail normal deviate 
testf would reject the null hypothesis (this time correctly) in 17 per 
cent of experiments. However, if the null hypothesis was accepted as 
true every time it was not rejected then it would be wrongly accepted 
in 83 per cent of experiments. The wrong acceptance of a false hypothesis 
is called an error of the second kind, and, in this case, the probability 
tf) of this sort of error is fi = 0-83. 

The power curve for a two-tailed normal deviate test for the difference 
between two means is shown in Fig. 6.1.2 and compared with the 
power curve for the (non-existent) ideal test that would always acoept 
true hypotheses and reject false ones. The power of even the best tests 
to detect real differences that are similar in size to the experimental 
error is quite small. 

(7) Some more subtle points about significance tests 

The critical reader will, no doubt, bave some objections to the arguments 
presented in this section. It is difficult to give a consensus of informed opinion 

f A t test (see | 9.4) in which the standard deviation is accurately known (e.g. beoause 
the samples are large) so the standard normal deviate, u (see f 4.3), can be used in place 
of t (aee §4.4). 



Copyrighted material 



94 



Tests of significance and the analysis of variance 



§6.1 




I 

i 




(b) 



Fia. 6.1.2. In both figures the abscissa gives the difference between the 
population means (expressed as a multiple of the standard deviation of the 
difference between means: see §9.4). (a) The power curve for a two-tail normal 
deviate test for difference between two means (see text) when ot = 0-06, i.e. the 
null hypothesis is rejected whenever P<0 06, so if it were actually true it would 
be wrongly rejected in 6 per cent of repeated experiments. If the null hypothesis 
were false, i.e. there is a difference between the population means (in this example, 
a difference equal in size to one standard deviation of the difference between 
means: see § 9.4) the null hypothesis would be rejected (correctly) in 17 per cent 
of experiments and not rejected (wrongly) in 0 = 83 per cent of experiments, 
(b) Power curve for the (non -existent) ideal test that always reject* a hypothesis 
(population means equal) when it is false, and never rejects it when it is true. 



Copyrighted material 



Tests of significance and the analysis of variance 95 



because, although there is much informed opinion, there la rather little consensu*. 
A personal view follows. 

The first point concerns the role of the null hypothesis and the role of prior 
knowledge, i.e. knowledge available before the experiment was done. It is widely 
advocated nowadays (particularly by Bayesians, see |{1.3 and 2.4) that prior 
information should be used in making statistical decisions. There is no doubt 
that this is desirable. All relevant information should be taken Into account in 
the search for truth, and in some fields there are reasonable ways of doing this. 
But In this book the view is taken that attention must be restricted to the infor- 
mation that can be provided by the experiment itself. This is forced on us because, 
in the sort of small-scale laboratory or clinical experiment with which we are 
mostly concerned, no one has yet devised a way that is acceptable to the scientist, 
as opposed to the mathematician, of putting prior information in a quantitative 
form. 

Now it has been mentioned already that In most real experiments it is unreal- 
istic to suppose that the null hypothesis f could ever be true, that two treatments 
could be exactly equi -effective. So is it reasonable to construct an experiment 
to test a null hypothesis? The answer is that it is a perfectly reasonable way of 
approaching our aim of preventing the experimenter from making a fool of 
himself if, as recommended above, we say only that 'the experiment provides 
evidence against the null hypothesis' (if P is small enough), or that 'the experiment 
does not provide evidence against the null hypothesis' (if P is large enough). 
The fact that there may be prior evidence, not from the experiment, against the 
null hypothesis does not make it unreasonable to say that the experiment itself 
provides no evidence against it, in those cases where the observations in the 
experiment (or more extreme ones) would not have been unusual in the (admit- 
tedly improbable) event that the null hypothesis was exactly true. 

And, because it has been stressed that if there is no evidence against the null 
hypothesis it does not imply that the null hypothesis is true, the inference from 
a large P value does not contradict the prior ideas about the null hypothesis. 
We may still be convinced on prior grounds that there is a real difference of some 
sort, but as it is apparently not large enough, relative to the experimental error 
and method of analysis, to be detected in the experiment, we have no idea of its 
size or direction. So the prior knowledge is of no practical importance. 

Another point concerns the discussion of power. It has been recommended 
that the result of significance test should be given as a value of P. It would be 
silly to reject the null hypothesis automatically whenever P fell below arbitrary 
level (0 06 say). Each case must be judged on its merits. So what is the justifica- 
tion for discussing in subsections (3) and (8) above, what would happen 'if the 
null hypothesis were always rejected when P C 0 06'? As usual, the aim is to 
prevent the experimenter making a fool of himself. Suppose, in a particular case, 
that a significance test gave P — 0-007, and the experimenter decided that, all 
things considered, this should be interpreted as meaning that the experiment 
provided evidence against the null hypothesis, then it is certainly of interest to 
the experimenter to known what would be the consequences of acting consistently 
in this way, in a series of imaginary repetitions of the experiment in question. 
This does not in any way imply that given a different experiment, under differ- 
ent circumstances, the experimenter should behave in the same way, i.e. use 
P = 0 007 as a critical level. 

t This remark applies to point hypotheses, i.e. those stating that means, populations, 
etc., are identical. All the null hypotheses used in this book are of this sort. 

I 



Copyrighted material 



96 Tests of significance and the analysis of variance §8.2 

6.2. Which sort of test should be used, parametric or 
nonparametric? 

Parametric tests, such as the t test and the analysis of variance are 
those based on an assumed form of distribution, usually the normal 
distribution, for the population from which the experimental samples 
are drawn. Nonparametric tests are those that, although they involve 
some assumptions, do not assume a particular distribution. A discussion 
of the relative 'advantages' of the tests is ludicrous. If the distribution 
is known (not assumed, but known; see § 4.6 for tests of normality), 
then use the appropriate parametric test. Otherwise do not. Neverthe- 
less the following observations are relevant. 

Characteristics of nonparametric methods 

(1) Fewer untested assumptions are needed for nonparametric me- 
thods. This is the main advantage, because, as emphasized in § 4.2, there 
is rarely any substantial evidence that observations follow a normal, 
or any other, distribution. The assumptions involved in parametric 
methods are discussed in § 11.2. Nonparametric methods do involve 
some assumptions (e.g. that two distributions are of the same, but 
unspecified, form), and these are mentioned in connection with in- 
dividual methods. 

(2) Nonparametric methods can be used for classification (Chapter 8) 
or rank (Chapters 9-11) measurements. Parametric methods cannot. 

(3) Nonparametric methods are usually easier to understand and use. 

Characteristics of parametric methods 

(1) Parametric methods are available for analysing for more sorts of 
experimental results. For example there are, at the moment, no widely 
available nonparametric methods for the more complex sort of analysis 
of variance or curve fitting problems. This is not relevant when choosing 
which method to use, because there is only a choice if a nonparametric 
method is available. 

(2) Many problems involving the estimation of population parameters 
from a sample of observations have so far only been dealt with by 
parametric methods. 

(3) It is sometimes listed as an advantage of parametric methods that 
if the assumptions they involve (see § 11.2) are true, they are more 
powerful (see § 6.1, para. (6)), i.e. more sensitive detectors of real 
differences, than nonparametric. However, if the assumptions are not 
true, which is normally not known, the nonparametric methods may 



Copyrighted material 



§6.2 



Tests of significance and the analysis of variance 97 



well be more powerful, so this cannot really be considered an advantage. 
In any case, even when the assumptions of parametric methods are 
fulfilled the nonparametric methods are often only slightly less powerful. 
In fact the randomization tests described in §§ 9.2 and 10.3 are as 
powerful as parametric tests even when the assumptions of the latter 
are true, at least for large samples. 

There is a considerable volume of knowledge about the asymptotic 
relative efficiencies of various tests. These results refer to infinite sample 
sizes and are therefore of no interest to the experimenter. There is less 
knowledge about the relative efficiencies of tests in small samples. In 
any case, it is always necessary to specify, among other things, the 
distribution of the observations before the relative efficiencies of tests 
can be deduced ; and because it is part of the problem that nothing is 
known about this distribution, even the results for small samples are not 
of much practical help. Of the alternative tests to be described, each 
can, for certain sorts of distribution, be more efficient than the others. 

There is, however, one rather distressing consequence of lack of 
knowledge of the distribution of error, which is, of course, not abolished 
by assuming the distribution known when it is not. 

As an example of the problem, consider the comparison of the effects 
of two treatments, A and B. The experimenter will be very pleased if a 
large and consistent difference between the effects of A and B is 
observed, and will feel, reasonably, that not many observations are 
necessary. But it turns out that with very small samples it is impossible 
to find evidence against the hypothesis that A and B are equi-effective, 
however large, and however consistent, the difference observed be- 
tween their effects, unless something is known about the distributions 
of the observations. Suppose, for the sake of argument, that the 
experimenter is prepared to accept P = 1/20 (two tail) as small 
enough to constitute evidence against the hypothesis of equi-effeotive- 
ness (see § 6.1). If the experiment is conducted on two independent 
samples, each sample must contain at least 4 observations (for all the 
nonparametric tests described in Chapter 9, q.v., the minimum possible 
two-tail P value with samples of 3 and 4 would be 2.3 !4 !/7 ! = 1/17$, 
however large and consistent the difference between the samples). 
Similarly, if the observations are paired, at least 6 pairs of observations 
are needed; with 5 pairs of observations the observations on the 
nonparametric methods described in Chapter 10, q.v., can never give a 
two-tail P less than 2.(J)» = 1/16. (See also the discussion in §§ 10.6 
and 11.9.) 



Copyrighted material 



98 Tests of significance and the analysis of variance § 6.2 

In contrast, the parametric methods can give a very low P with the 
smallest samples if the difference between A and B is sufficiently large 
and consistent. Nevertheless, these facts mean that it is a disadvantage not 
to know the distribution of the observations. They do not constitute a 
disadvantage of nonparametric tests. The problem is less acute with 
samples larger than the minimum sizes mentioned. 

In view of these remarks it may be wondered why parametric tests 
are used at all when there are nonparametric alternatives. In fact they 
are still widely used even now. This is partly because of familiarity. 
The t test and analysis of variance were in use for many years before 
most nonparametric methods were developed. It probably also results 
from the sacrifice of relevance to the real world for the sake of mathe- 
matical elegance. Methods based on the assumption of a normal distribu- 
tion have been developed to cover a wide range of problems within a 
single, admittedly elegant, mathematical framework. 

It is not uncommon for those who are dubious about the assumptions 
necessary for parametric tests to be told something along the lines 
'experience has shown that the t test (for example) will not mislead 
us'. Unfortunately, as Mainland (1963) has pointed out, this is just 
wishful thinking. There is no knowledge at all of the number of times 
people have been misled by using the t test when they would not have 
been misled by a nonparametric test (see §§ 4.2 and 4.6). 

A plausible reason for using tests based on the normal distribution 
is that some of them have been shown to be fairly insensitive to some 
sorts of deviations from the assumptions on which they are based if the 
samples are reasonably big. The tests are said to be fairly robust. But 
this knowledge can usually be used only by intuition. One is never 
sure how large is large enough for the purposes in hand. When the 
nature and extent of deviations from the assumptions is unknown, 
the amount of error resulting from assuming them true is also unknown. 
It is much simpler to avoid as many as possible of the assumptions. 



If a nonparametric test is available it should be used in pref- 
erence to the parametric test, unless there is experimental evidence 
about the distribution of errors. 



In spite of what has just been said parametric methods are disoussed 
in the following chapters, even when nonparametrio methods exist. 
This is necessary as an approach to the more complex experimental 
designs, curve-fitting problems, and biological assay for which there are 



Copyrighted material 



§ 6.2 Tests of significance and the analysis of variance 99 

still hardly any nonparametric methods available, BO parametric teats 
or nothing must be used. Whichever test is used, it should be inter- 
preted as suggested in §§ 1.1, 1.2, 6.1 , and 7.2, the uncertainty indicated 
by the test being taken as the minimum uncertainty that it is reasonable 
to feel. 

6.3. Randomization tests 

The principle of randomization tests, also known as permutation 
tests, is of great importance because these tests are among the most 
powerful of nonparametric tests (see § 6.1 and 6.2). Moreover, they are 
easier to understand, at the present level, than almost all other sorts 
of test and they make very clear the fundamental importance of 
randomization. Examples are encountered in §§ 8.2, 8.3, 9.2, 9.3, 
10.2, 10.3, 10.4, 11.5, 11.7, and 11.9. 

6.4. Types of sample and types of measurement 

When comparing two groups the groups may be related or inde- 
pendent. For example, to compare drugs A and B two groups could be 
selected randomly (see § 2.3) from the population of patients, and one 
group given A, the other B. The two samples are independent. Inde- 
pendent samples are discussed in Chapters 8 and 9, and in §§ 11.4, 
11.5, and 11.9. On the other hand, the two drugs might both be given, 
in random order, to the same patient, or to a patient randomly selected 
from a pair of patients who had been matched in some way (e.g. by 
age, sex, or prognosis). The samples of observations on drug A and 
drug B are said to be related in this case. This is usually a preferable 
arrangement if it is possible; but it may not be possible because, for 
example, the effects of treatments are too long-lasting, or because of 
ignorance of what characteristics to match. Related samples are 
discussed in Chapter 10 and in §§ 8.6, 11.6, 11.7, and 11.9. 

The method of analysis will also depend on what sort of measure- 
ments are made. The three basic types of measurement are (1) classifica- 
tion (the nominal scale), (2) ranking (the ordinal scale), and (3) num- 
erical measurements (the interval and ratio scales). For further details 
see, for example, Siegel (1956a, pp. 21-30). If the best that can be 
done is classification as, for example, improved or not improved, worse 
or no change or better, passed or failed, above or below median, then 
the methods of analysis in Chapter 8 are appropriate. If the measure- 
ments cannot be interpreted in a quantitative numerical way but can 



Copyrighted material 



100 Tests of significance and the analysis of variance §6.4 

be arranged (ranked) in order of magnitude (as, for example, with arbitrary 
scores such as those used for subjective measurements of the intensity 
of pain) then the rank methods described in §§9.3, 10.4, 10.5, 11.5, 
11.7, and 11.9 should be used. For quantitative numerical measurements 
the methods described in the remaining sections of Chapters 9-11 are 
appropriate. 

Methods for dealing with a single sample are discussed in Chapter 7 
and those for more than two samples in Chapter 11. 



Copyrighted material 



7. One sample of observations. The 
calculation and interpretation of 
confidence limits 



'Eine Hauptursache der A raw t in den Wiseenachaften iat meiat eingebUdeter 
Reichtum. Es is nicht ihr Ziel, der unendlichen Weiaheit eine TUr zu offnen, 
aondern eine Grenze zu setzen dem unendlichen Irrtum.'t 

t 'One of the chief causes of poverty in science is usually imaginary wealth. The aim 
of science is not to open a door to infinite wisdom, but to set a limit to infinite error'. 

Galileo in Brecht'e Lebm de* Galilei 



7.1. The representative value: mean or median? 

It is second nature to calculate the arithmetic mean of a sample of 
observations as the representative central value (see §2.5). In fact this 
is an arbitrary procedure. // the distribution of the observations were 
normal it would be a reasonable thing to do since the sample mean 
would be an estimate of the same quantity (the population mean 
= population median) as the sample median (§§2.5 and 4.5), and it 
would be a more precise estimate than the median. However, the 
distribution will usually not be known, so there is usually no reason to 
prefer the mean to the median. For more discussion of the estimation of 
'best' values see §§ 12.2 and 12.8 and Appendix 1. 

7.2. Precision of inferences. Cen estimates of error be trusted ? 

The answer is that they cannot be trusted. The reasons why will 
now be discussed. Having calculated an estimate of a population median 
or mean, or other quantity of interest, it is necessary to give some sort 
of indication of how precise the estimate is likely to be. Again it is 
second nature to calculate the standard deviation of the mean — the 
so-called 'standard error' — see § 2.7. This is far from ideal because 
there is no simple way of interpreting the standard deviation unless the 
distribution of observations is known. // it were normal then the 
confidence limits, sometimes called confidence intervals, based on the 
t distribution (§7.4) would be the ideal way of specifying precision 
since it allows for the fact that the sample standard deviation is itself 



Copyrighted material 



102 The calculation and interpretation of confidence limit* § 7.2 

only a more or less inaccurate estimate of the population value (see 
§ 4.4). 

As usual it must be emphasized that the distribution is hardly ever 
known, so it will usually be preferable to use the nonparametric 
confidence intervale for the median (§7.3), which do not assume a 
normal distribution. 

No sort of confidence interval, nonparametric or otherwise, can 
make allowance for samples not having been taken in a strictly random 
fashion (see §§1.1 and 2.3), or for systematic (non-random) errors. 
For example, if a measuring instrument were wrongly calibrated so 
that every reading was 20 per cent below its correct value, this error 
would not be detectable and would not be allowed for by any sort of 
confidence limits. 

Therefore in the words of Mainland (1967a), confidence limits 
'provide a kind of minimum estimate of error, because they show how 
little a particular sample would tell us about its population, even if 
it were a strictly random sample'. It seems then that estimates cannot 
be trusted very far. To quote Mainland (1967 b) again, 

'Any hesitation that I may have had about questioning error estimates in 
biology disappeared when I recently learned more about error estimates in that 
sanctuary of scientific precision — physics. 

'One of the most disturbing things about scientific work is the failure of an 
investigator to confirm results reported by an earlier worker. For example in 
the period 1896 to 1961, some 16 observations were reported on the magnitude 
of the astronomical unit (the mean distance from the earth to the sun). You will 
find these summarized in a table . . . which lists the value obtained by each 
worker and his estimates of plus or minus limits for the error of the estimate. 
It is both entertaining and shocking to note that, in every case, a worker's 
estimate is outside the limits set by his immediate predecessor. Clearly there is 
an unresolved problem here, namely, that experimenters are apparently unable 
to arrive at realistic estimates of experimental errors in their work" (Youden 
1963). 

If we add to the problems of the physicist the variability of biological and 
human material, and the nonrandomness of our samples from it, we may well 
marvel at the confidence with which "confidence intervals" are presented.' 

Confidence limits purport to predict from the results of one experi- 
ment what will happen when the experiment is repeated under the 
same (as nearly as possible) conditions (see § 7.9). But the experimentalist 
will not need much persuading that the only way to find out what will 
happen is actually to repeat the experiment and see. And on the few 
occasions when this has been done in the biological field the results 
have been no more encouraging than those just quoted. For example, 
Dews and Berkson (1964) found that the internal estimates of error 



Copyrighted material 



§ 7.2 The calculation and interpretation of confidence limits 103 

calculated in individual biological assays were mostly considerably 
lower than the true error found by actual repetition of the assay. As 
Dews and Berkson point out, if the assays were performed at different 
times or in different laboratories it would probably be said that there 
were 'inter -time' or 'inter-laboratory' differences; and if there were 
no such 'obvious' reasons for the interval error estimates being too 
low, then probably 'the animals would be stigmatized as "heterogen- 
eous", with more than a hint that there had been too little incestuous 
activity among them'. The moral is onoe again that confidence limits, 
or other estimates of error calculated from the internal evidence of an 
experiment, must be interpreted as lower bounds for the real error. 

Nevertheless, on the grounds that a minimum estimate of error is 
better than none at all, examples follow. Their interpretation is 
discussed further in § 7.9. 

7.3. Nonparametric confidence limits for the median 

Limits can be found very simply indeed, without any calculation 
at all, using the table of Nair (1940) which is reproduced as Table Al. 

Consider, for example, determinations of the glomerular filtration 
rate (ml/min) from nine randomly selected dogs : 

135 133 154 124 153 142 140 134 138. 

The observations will be denoted in the usual way (§2.1), y t (i = 1, 2, 
n) and n = 9. Now rank the observations in ascending order; 124, 
133, 134, 135, 138, 140, 142, 153, 154. These observations will be 
denoted y (() (» = 1, 2,...9), the parenthesized subscript being used to 
indicate that the observations have been ranked, i.e. y x simply denotes 
the first observation written down, whereas indicates the smallest 
of the observations. The sample estimate of the population median is 
y <6) = 138 ml/min (using (2.5.5)). Reference to Table Al, for the 
approximately 95 per cent confidence limits, and for a sample size 
n = 9, gives a value r = 2. This means that the second (i.e. the rth) 
observation from each end, viz. 133 ml/min (= y {2) ) and 153 ml/min 
(= ya»). are to be taken as the confidence limits for the estimated 
median, 138 ml/min. The table also gives, in the next column after r, 
the figure 96- 1, which indicates that these are actually 96- 1 per cent 
confidence limits. The fact that r has to be a whole number makes 
it impossible to get exactly 95 per cent limits. There is a probability of 
0-961 that the population median is between y< a) and y (8) in the sense 
explained in § 7.9. 



Copyrighted material 



104 The calculation and interpretation of confidence limits § 7.3 

The reasoning behind the construction of Table Al is roughly as 
follows (see Nair (1940) and Mood and Graybill (1963, p. 407)). Let m 
denote the population (true) median. By definition of the median 
(§ 2.5) the probability is 1/2 that an observation selected at random 
from the population, which is assumed to follow any continuous 
distribution, will be less than m. The probability that i observations 
out of n fall below m follows directly from the binomial distribution 
(3.4.3) with &=\, i.e. 



!(.-.■) I W < 731 > 

To find from this the probability that the rth ranked observation, 
y ir j, in a sample of n observations, will be greater than the population 
median, note that this will be the case if the sample contains either 
i — 0 or 1 or... or (r — 1) observations below the median, so, by using the 
addition rule (2.4.2), 

n! /l\ n 

If a 95 per cent confidence limit is required, r is now chosen so as to 
make this expression as near as possible to 0 025 (2-5 per cent). In the 
above example this means taking r = 2 giving 

<-i 9! /1\ 8 1 9 
^ >-'- ,? 0 i!(5=0i(i) = ST2+M-2 " ° 0198 ' 

i.e. it is unlikely that y (a> will be above the population median. Because 
of the symmetry of the binomial distribution when & — \ (§ 3.4) 
this is also the probability that y (8) < m \ it is equally unlikely that 
y (8) is less than the population median. Thus, in general, (7.3.2) also 
gives P(y< n -r+i)<««)- So the probability of the event that either 
m < t/{2) or m> y m is, again by the addition rule (2.4.2), 0-0195 
-f 0 0195 = 0 039. If this event does not occur then it mustj be that 

t If you find this argument takes you by surprise, in spite of its mathematical 
impeccability, you may be relieved to find that this view ia shared by some of the most 
eminent mathematical statisticians. For example, Lindley (1969) says "The procedure 
which transfers a distribution on x to one on 0 through a pivotal quantity such as * — 6 
has always seemed to me to be reminiscent of a conjuring trick : it all looks very plausible, 
but you cannot see how it is done . . . . As a young man I remember asking E. C. Fieller 
to suggest a really difficult problem. His answer was beautifully simple: "The probability 
that an observation is less than the median is 1/2: explain why this means that the 
probability that the median is greater than the observation is also 1/2." I could offer no 
really sound explanation then, and I still cannot.' You may also be relieved to find that, 
in spite of the difficulties, virtually all statisticians, faced with experimental results 
such as those in this section, would reach a conclusion that differed little, if at all, from 
that presented here. 



Copyrighted material 



§ 7.3 The calculation and interpretation of confidence limits 105 

y {7) ^ m < y (6) , and the probability of this must be 1— 0-039 = 0*961, 
as discovered above from Table Al. The general result is 

t-r-i n \ ny 

PI*™ < ** < y ( n-r + i,] = i-2l ^TiyAV ' ( ? - 3 - 3 ) 

and r is chosen so that this is as near as possible, given that r must 
be a whole number, to 0-95, or whatever other confidence probability 
is required. A very similar sort of statement is found for the mean in 
the next section. 

The method assumes that the distribution of the observations is 
continuous (see § 4.1) so it is not possible for two observations to be 
exactly the same. In practice there may be ties because of rounding 
errors but this does not matter even though very occasionally a sample 
could give the same, say 95 and 99 per cent limits. If the distribution 
is really discontinuous then the method is not appropriate. 

7.4. Confidence limits for the meen of e normelly distributed 
verieblet 

Confidence limits for the population mean 

In the improbable event that the glomerular filtration rate of dogs 
was known to follow the normal distribution it would be possible to 
calculate confidence limits for the mean of the nine observations given 
in § 7.3. The sample mean is Ey/n = 1253/9 = 139*2 ml/min, compared 
with the sample median of 138 ml/min. The sum of squared deviations 
is given by (2.6.5) as 

„ • (It/,) 2 (1253) a 
2 (Vi-9) 2 = I V^~r = 175179-^—-'- = 733-56. 
<-i i-i n 9 

Therefore the variance of y is estimated to be s 2 (y) = 733-56/(9 — 1) 
= 91-69; the variance of the mean is s 2 {§) = 91-69/9 = 10-19 by eqn. 
(2.7.8), and the estimated standard deviation of the sample mean 
glomerular filtration rate is s(y) — -/(^O'*®) ml/min = 3-192 ml/min. 
These estimates have n— 1 = 8 degrees of freedom (§ 2.6). From this 
estimate of scatter, and the assumption that y (and therefore y) is 
normally distributed, limits can be calculated within which the mean 
of the population from which the observations were tirawn (which may 

t The assumption of normality could be tested as in § 4.6 if there were more observa- 
tions, but with one sample of 9 no useful test can be made. 



Copyrighted material 



106 The calculation and interpretation of confidence limits §7.4 



or not be the population in which the investigator is really interested) 
is likely to lie. 

The limits must be based on Student's t distribution (§4.4) because 
only the estimated standard deviation is available. Reference to tables 
(see § 4.4) shows that, in the long run, 95 per cent of values of t (with 
8 d.f.) will fall between t = -2-306 and t = +2-306. The definition of 
t (eqn. (4.4.1)) is (x—/a)I*{x) where x is normally distributed. In the 
present example the (assumed) normally distributed variable of interest 
is the sample mean, if, so t is denned as (y— ^)/*(p). 

It follows that in 95 per cent of experiments t — (y—f*)ls{y) is 
expected to lie between —2-306 and +2-306, i.e. 

P[-2-3O6<(0-/OMpK+ 2-306] = 0-95, 
.\i , [-2.306.*($K($~/<K+2-306.«(y)3 = 0-96, 
.\i 3 [p-2-3O6.,s(0)<^#+2-3O6.a(0)] = 0-96.f 

This statement, which is analogous to (7.3.3), indicates our confidence 
that the population mean, p, lies between the P — 0-95 confidence 
limits, viz. £-2-306*$) = 139-2— (2-306x3-192) = 131-8 ml/min and 
£+2-3O6a(0) = 139-2-f (2-306 X 3 192) = 146-6 ml/min. Compare the 
mean 139-2 ml/min and its P = 0-95 Gaussian confidence limits, 
131-8 to 146*6 ml/min, with the median and its confidence limits found, 
with fewer assumptions, in § 7.3. 

Condensing the above argument into one formula, the Gaussian 
confidence limits for u (given an estimate of it, y, the mean of a sample 
of n normally distributed observations) are 



In general, the confidence limits for any normally distributed variable, 
x, are 



where the value of Student's t is taken from tables (see § 4.4) for the 
probability required and for the number of degrees of freedom associated 
with 8{x). 

To be more sure that the limits will include ft (the population 
value of see § A 1.1), they must be made wider. For example, the 
value of t for P = 0-99 with 8 d.f. is, from tables, 3-355. That is 0-5 

f If thk argument ■hake* you, boo the footnote in § 7.3 (p. 104), reading y for * and ft 




(7.4.1) 



x±te{x), 



(7.4.2) 



for 0. 



Copyrighted material 



§ 7.4 The calculation and interpretation of confidence limits 107 



per oent (0*005) of the area under the curve for the distribution of t 
with 8 d.f. lies below —3*355 and another 0 5 per oent above -f- 3*355, 
and 99 per oent lies between these figures. The 99 per oent Gaussian 
confidence limits are then $±t*{§), i.e. 128*5 to 149*9 ml/min. 

Confidence limits for new observations 

The limits just found were tboee expected to contain, ft, the population mean 
value of y (and also of $ n and $ m see below). If limits are requited within which a 
new observation from the same population is expected to lie the result is rather 
different. Suppose, aa above, that n observations are made of a normally distri- 
buted variable, y. The sample mean is j?» and the sample deviation s(y), say. 
If a further m independent observations were to be made on the same population, 
within what limits would their mean, g m , be expected to lie? The variable 
& n —jJ m will be normally distributed with a population value ft— ft = 0, so t 
= (&-0J/*ffti-9»]. (see § 4.4). Using (2.7.3) and (2.7.8) the estimated variance 
i» ^-^J = ^(^)+^(f7 w )=^(y)/n+s a (y)/m = s a (y).(l/n+l/m). The best 
prediction of the new observation, £„, will, of course, be the observed mean, •?„. 
This is the same as the estimate of ft, but the confidence limits must be wider 
because of the error of the new observations. As above, P[—t< (y n — «?«,)/ 
4$« —&n]< + <] = 0-95 so, by rearranging this as before, the confidence limits for 
y m are found to be 

WM- + =)} (7t - 3) 

For example, a single new observation (m = 1) of glomerular filtration rate 
would have a 95 per cent chance (in the sense explained in § 7.9) of lying within 
the limits calculated from (7.4.3), viz. 139*2 ± 2-306V[91-69(l/9 + l)]; that is, 
from 116*9 ml/min to 162-6 ml/min. These limits are far wider than those for ft. 

Where m is very large, (7.4.3) reduces to the Gaussian limits for ft, eqn (7.4.1) 
(as expected, because in this case y m becomes the same thing as ft). 

It is important to notice the condition that the m new observations are from 
the same Gaussian population as the original n. As they are probably made 
later in time there may have been a change that invalidates this assumption. 

7.6. Confidence limits for the ratio of two normally distributed 
observations 

If a and b are normally distributed variables, their ratio m = a/6 
will not be normally distributed, so if the approximate variance of the 
ratio is obtained from (2.7.16), it is only correct to calculate limits 
by the methods of § 7.4 if the denominator is large compared with its 
standard deviation (i.e. if g is small, see § 13.5). This problem is quite 
a common one because it is often the ratio of two observations that is 
of interest rather than, say, their difference. 

If a and b were lognormally distributed (see § 4.5) then log o and 
log b would be normally distributed, and so log m = log a— log 6 would 



Copyrighted material 



108 The calculation and interpretation of confidence limits § 7.5 



be normally distributed with var(log m) = var(log a)+var(log b) 
from (2.7.3) (given independence). Thus confidence limits for log m 
could be calculated as in § 7.4, log m±ty/[v&r(\og m)] t and the anti- 
logarithms found. See § 14.1 for a discussion of this procedure. 

When a and 6 are normally distributed the exact solution is not 
difficult. But, because it looks more complicated at first sight, it 
will be postponed until § 13.5 (see also § 14.1), nearer to the numerical 
examples of its use in §§ 13.11-13.15. 

7.6. Another way of looking at confidence limits 

A more general method of arriving at confidence intervals will be 
needed in §§ 7.7 and 7.8. The ingenious argument, showing that limits 
found in the following way can be interpreted as described in § 7.9, is 
discussed clearly by, for example, Brownlee (1965, pp. 121-32). It will 
be enough here to show that the results in § 7.4 can be obtained by a 
rather different approach. 

For simplicity it will be supposed at first that the population standard 
deviation, a, is known. It is expected that in the long run 95 per cent 
of randomly selected observations of a normally distributed variable 
y, with population mean pi and population standard deviation o, will 
fall within u±h9Go (see §4.2). In §7.4 the normally distributed 
variable of interest was y, the mean of n observations, and similarly, 
in the long run, 95 per cent of such means would be expected to fall 
within u±h9Qo(y), where a(y) = afy/n (by (2.7.9)). The problem is 
to find limits that are likely to include the unknown value of u. 

Now consider various possible values of u. It seems reasonable to 
take as a lower limit, u h say, a value which, »/it were the true value, 
would make the the observation of a mean as large as that actually 
observed (# ob ,) or larger a rare event — an event that would only 
occur in 2-5 per cent of repeated trials in the long run, for example. In 
Fig. 7.6.1(a) the normal distribution of y is shown with the known 
standard deviation or(y), and the hypothetical mean /z L chosen in the 
way just described. 

Similarly, the highest reasonable value for u, say u H , could be 
chosen so that, if it were the true value, the observation of a mean 
equal to or less than y obB would be a rare event (again P = 0*025, say). 
This is shown in Fig. 7.6.1 (b). It is clear from the graphs that # obs 

- jtiL+MMtofy) — u H — l-96<7(y). Rearranging this gives /x L = y oXm 

— l*96a(y), and u E = y 0 x>*-\- l*96a(y). If or is not known but has to be 
estimated from the observations, then a{y) must be replaced by 



Copyrighted material 



§ 7.6 The calculation and interpretation of confidence limits 109 

8{y), and so 1-96 must be replaced by the appropriate value of Student's 
t (for example 2-306 for P = 0-95 limits in the example in § 7.4; see 
also § 4.4). When this is done fi h and p B are the limits previously found 
using (7.4.1). 

Observed 



mean 




Obaervrd 
mean 

Fia. 7.6.1. One way of looking at confidence limits. See text. 

7.7. What is the probability of 'success ? Confidence limits for 
tha binomial probability 

In §§ 3.2-3.5 it was described how the number of successes (r) out of 
n trials of an event would be expected to vary in repeated sets of n 
trials when the probability of 'success' was ^ at each trial and the 
probability of 'failure' was 1 — Usually, of course, the problem is 
reversed. & is unknown and must be estimated from the experimental 
results. For example, if a drug were observed to cause improvement in 
r = 7 out of n = 8 patients, who had been selected strictly randomly 
(see § 2.3) from some population of patients, then the best estimate of 
the proportion {&) of patients in the population that would improve 



Copyrighted material 



1 10 The calculation and interpretation of confidence limits § 7.7 



when given the drug is r/n (as in § 3.4), i.e. 7/8 = 0-875 or 87-8 per 
cent. What is the error of this estimate ? Would it be unreasonable, 
for example, to suppose that the population contained only 50 per cent 
of 'improvers' ? The answer can be found without any calculation at 
all using Table A2, which is based on the following reasoning. 

The approach described in $ 7.6 can be used to find confidence limits for the 
population value of For concreteneea suppose that 96 per cent (or P = 0-96) 
confidence limits are required for the population value of J* when r obm 'successes' 
have been observed out of n trials. The highest reasonable value of & B say, 
will be taken as the value that, if it were the true value, would make the observa- 
tion of r ob# or fewer successes a rare event (an event occurring in only 2-6 per cent 
of repeated sets of n trials). Now the probability of r successes P(r), is given 
by (3.4.3), and r < if r = 0 or 1 or.. .or r ob ., so, using (3.4.3) and the 
addition rule, (2.4.2), it is required that 

Pit < = r "g* wt gr (1 _<? H )-r = 0 026. (7.7.1) 

The only unknown in this equation is & H , the upper confidence limit for the 
population proportion, so it can be solved for S u . There is no simple way of 
rearranging the equation to get & H however, so tables are provided (Table A2) 
giving the solution. Similarly, the lowest reasonable value, ;-9 L , for the population 
9 (the lower confidence limit for &) is taken as the value that, if it were the 
true value, would make the observation of r obl successes or more (i.e. r — r ob . 
Of r obs + 1 or...or n) a rare event. Thus & L is found by solving 

> r ob ,] = T -^ L )-' = 0 026. (7.7.2) 

Again the solution is tabulated in Table A2. 

The use of Table A2 

Confidence limits (95 and 99 per cent) for the population value of 
100 & are tabulated for any observed r, and sample sizes from n = 2 
to n = 30, and also some values for n = 1000 for comparison. Other 
sample sizes are tabulated in the Documenta Oeigy Scientific Tables 
(1962, pp. 86-103). In the example at the beginning of this section 
r = 7 out of n — 8 patients improved (lOOr/n = 87-6 per cent 
improvement). Consulting Table A2 with n = 8 and r = 7 shows 
that the P — 0-95 confidence limits (100^ L to 100^ „ from (7.7.1)) 
and (7.7.2) are 47-35 to 99-68 per cent. In other words, if repeated 
samples of 8 were taken from a population that actually contained 
47*35 per cent of improvers, 2-5 per cent of the samples would contain 
7 or more (i.e. 7 or 8) improvers. And if the population actually con- 
tained 99-68 per oent of improvers then, in the long run, 2-5 per cent 



Copyrighted material 



§ 7.7 The calculation and interpretation of confidence limits 111 



of samples would contain 7 or fewer improvers. Thus, if the drug were 
tested on an infinite sample (rather than only 8) it would not be sur- 
prising (see § 7.9 for a more precise interpretation) to find any propor- 
tion of patients improving between ^ L = 0*4735 and ^ H — 0-9968. 
The observation is compatible with any hypothetical population & 
that lies between the confidence limits (see § 9.4) so the observation of 
7 improving out of 8 cannot be considered incompatible with a true 
improvement rate of 50 per cent (^ — 0-5) at the P — 0*95 level of 
significance. For greater certainty the P = 0*99 confidence limits 
would be found from the tables. They are, of course, even wider, 
36*85 to 99*94 per cent. A sample of 8 gives surprisingly little informa- 
tion about the population it was drawn from, even when all the assump- 
tions of randomness and simple sampling (see § 3*2) are fulfilled. 

The comparison of two observed binomial proportions is a different 
problem. It is discussed in Chapter 8. 

7.8. The black magical assay of purity in heart as an example of 
binomial sampling 

In a sadly neglected paper, Oakley (1943) proposed an assay method 
for purity in heart. Oakley points out that lack of statistical knowledge 
may vitiate a worth-while experiment the apparent failure of which 
may deter others from repeating it, and that this fate seems to have 
overtaken an experiment carried out many years ago in Germany. 
The only known source (Anon 1932) describes the experiment thus : 

"The legend of the Brocken (the famous peak in the Han Mountains noted for its 
"spectre" and as the haunt of witches on Walpurgis Night), according to which a 
"virigin he-goat" can be converted into "a youth of surpassing beauty" by 
spells performed in a magic circle at midnight, was tested on June 17th by 
British and German scientists and Investigators, including Professor Joad and 
Mr. Harry Price of the National Institute of Psychical Research. The object was 
to expose the fallacy of Black Magic and also to pay a tribute to Goethe, who used 
the legend in "Faust". Some wore evening dress. The goat was anointed with the 
prescribed compound of scrappings from church bells, bats' blood, soot and honey 
The necessary "maiden pure in heart" who removed the white sheet from the 
goat at the critical moment, was Fraulein Urta Bohn, daughter of one of the 
German professors taking part in the test. Her mother was a Scotswoman (form- 
erly Miss Gordon). The scene was floodlit and filmed. Ae our photographs show, 
the goat remained a goat, and the legend of the Brocken was dispelled V 

The main variables are the virgin he-goat and the maiden pure in 
heart. Virginity may for the present be regarded as an absolute char- 
acter, but purity in heart no doubt varies from person to person. 

9 



Copyrighted material 



112 The calculation and interpretation of confidence limits § 7.8 

Oakley therefore supposed that it might be possible to estimate the 
purity in heart index (PHI) of a maiden by observing how many of a 
group of he-goats are converted into young men. The original experi- 
menters were clearly guilty of a grave scientific error in using only one 
he-goat. 

We shall assume, as Oakley did, that the conversion of he-goats into 
young men is an all-or-nothing process ; either complete conversion or 
nothing occurs. Oakley supposed, on this basis, that a comparison 
could be made between, on one hand, the percentage of he-goats 
converted by maidens of various degrees of purity in heart, and, on the 
other hand, the sort of pharmacological experiment that involves the 
measurement of the percentage of individuals showing a specified 
effect in response to various doses of a drug. In conformity with the 
common pharmacological practice he supposed that a plot of percent- 
age he-goat conversion against log purity in heart index (log PHI) 
would have the sigmoid form shown in Fig. 14.2.4. Aa explained in 
Chapter 14, this implies that log PHI required to convert individual 
he-goats is a normally distributed variable. Furthermore it means that 
infinity purity in heart is required to produce a population he-goat 
conversion rate (HGCR) of 100 per cent. 

Although there is a lack of experimental evidence on this point, 
the present author feels that the assumption of a normal distribution 
is, as so often happens, without foundation (see § 4.2). The implication 
of the normality assumption, that there exist he-goats so resistant to 
conversion that infinite purity in heart is needed to affect them, has 
not been (and cannot be) experimentally verified. Furthermore the 
very idea of infinite purity in heart seems likely to cause despondency 
in most people, and should therefore be avoided until such time as its 
necessity may be demonstrated experimentally. Oakley's treatment of 
the problem requires, in addition, that PHI be treated as an independent 
variable (in the regression sense, see Chapter 12), which raises problems 
because there is no known method of measuring PHI other than he-goat 
conversion. 

In the light of these remarks it appears to the present author desirable 
that the purity in heart index should be redefined simply as the 
population percentage of he-goats converted, f This simple operational 
definition means that the PHI of all maidens will fall between 0 and 
100, and confidence limits for the true PHI can be found easily from 

t i.e., in the more rigorous notation of Appendix 1, PHI = E{HOCR]. 



Copyrighted material 



§ 7.8 The calculation and interpretation of confidence limits 1 13 

the observed conversion rate (which should be binoraially distributed, 
see §§ 3.2-3.5) using Table A2, as explained in § 7.7. 

For example, if it were observed that a particular maiden caused 
conversion of r — 2 out of n = 4 he-goats, the estimated PHI would be 
100 x 2/4 = 50 per cent, and from Table A2 the confidence limits 
(P = 0-95) for the PHI are 6-76-93-24 per cent. Clearly the information 
to be gained from a sample of only four he-goats is so imprecise that it 
is difficult to conceive what use it could be put to. Oakley recommended 
that for preliminary experiments at least n = 10 he-goats should be 
used. If r = 5 (50 per cent) of these were observed to be converted 
Table A2 would give the confidence limits (P = 0-95) for the true PHI 
as 18-71-81*29 per cent. While the most extreme forms of vice and of 
virtue appear to be ruled out by this result, there is still considerable 
uncertainty about the PHI. If a greater degree of confidence were 
required, as might happen, for example, if a potential husband 
demanded a certain minimum (or, alternatively, a certain maximum) 
PHI before committing himself, the P — 0-99 confidence limits could 
be found from Table A2. They are 12-83-87- 17 per cent. The most 
tolerant suitor might be forgiven for requiring a larger sample. 

These calculations show that the assay is subject to considerable 
experimental error; and the problem of measuring very high or very 
low PHIs is even more difficult (because percentage responses around 
50 per cent are the most accurately determined).t If the practioal 
difficulties involved in using samples of n = 1000 he-goats could be 
overcome, PHIs not too far from 50 per cent could be determined with 
reasonable accuracy. For r = 600 converted, the confidence limits 
{P = 0-95) from Table A2 are 46-85-53-15 per cent. If only r = 10 
he-goats were converted out of 1000 (1 per cent) the confidence limits 
(P = 0-95) should be 0-48-1-84 per cent. Although the relative error is 
a good deal bigger than for conversion rates near 50 per cent, this is 
likely to be precise enough for practical purposes. 

A more precise and economical assay is clearly needed, but until 
more experimental work is done the present method will have to do. 
However, as Oakley points out, 'All thoughtful persons must regard 
the indiscriminate conversion of he-goats into young men with concern, 
for there is no knowledge of what education or social, political or 

t This depends on what is meant by accuracy. It is true if one is interested in the 
relative error of the proportion converted (or not converted, whichever is the smaller). 
It is also true if, as in Chapter 14, one is interested in the error of the dose producing a 
specified proportion converted in quantal experiments. 



Copyrighted material 



1 14 The calculation and interpretation of confidence, limits §7.8 

economic views such young men might have, and it might well be that 
their behaviour would bring scientific experiment into disrepute. This 
is, however, a problem for necromancers rather than statisticians.' 

7.9. Interpretation of confidence limits 

The logical basis and interpretation of confidence limits are, even now, 
a matter of controversy. However, few people would contest the 
statement that if P = 0-95 (say) limits were calculated aooording to 
the above rules in each of a large number of experiments then, in the 




35 1 1 

X I 1 1 1 1 1 1 1 1 1 1 L . 1 : . i 1 1 1 1_ 

> 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

Experiment number 

Fio. 7.9.1. Interpretation of confidence limits. Repeated estimates (e.g. 
sample mean) of a parameter (e.g. population mean), and their 95 per cent 
confidence limits. In this (ideal) case one experiment (number 7) out of twenty 
gave confidence limits that do not include the population value. One in twenty is 

the predicted long-run frequency. 

long run, 95 per cent of the intervals so calculated would include the 
population mean (§ 7.4) or median (§ 7.3), u, if the assumptions made 
in the calculation were true. The limits must be regarded as optimistic 
as explained in § 7.2. 

In any particular experiment a single confidence interval is calculated 
which obviously either does or does not include p. It might therefore 
be thought that it could be said a priori that the probability that the 
interval includes n is either 0 or 1, but not some intermediate value. 
However, in a series of identically conducted experiments, somewhat 
different values of the sample median or mean, and of the sample 



Copyrighted material 



§7.9 The calculation and interpretation of confidence limits 1 1 5 



scatter, for example of s(y), will, in general, be found in every experi- 
ment. The confidence limits will therefore be different from experiment 
to experiment. The prediction is that, in the long run 95 per cent (19 
out of 20) of such limits will include u as illustrated in Fig. 7.9.1. It is 
not predicted that in 95 per cent of experiments the true mean will 
fall within the particular set of limits calculated in the one actual 
experiment. 

Thus, if one were willing to consider that the actual experiment 
was a random sample from the population of experiments that might 
have been done, i.e. that 'nature has done the shuffling' one could go 
further and say that there was a 95 per cent chance of having done an 
experiment in which the calculated limits include the true mean, p. 

Another interpretation of confidence intervals will be mentioned 
later during the discussion of significance tests. 



Copyrighted material 



8. Classification measurements 



•In your otherwise beautiful poem, there is a verse which reads: 

"Every moment dies a man. 
Every moment one is bom." 

It must be manifest that, were this true, the population of the world would be at 
a standstill. In truth the rate of birth is slightly in excess of that of death. I 
would suggest that in the next edition of your poem you have it read : 

"Every moment dies a man, 
Every moment 1 1/16 is born." 

Strictly speaking this is not correct. The actual figure is a decimal bo long that 
I cannot get it in the line, but I believe 1 1/10 will be sufficiently accurate for 
poetry. 

I am etc.' 

Letter said to have been written to Tennyson by Charles Babbage after reading 
•The vision of sin' (Mathematical Qaxette, 1027, p. 270) 



8.1 . Two independent samples. Relationship between various 
methods 

Classification measurements and independent samples were dis- 
cussed in § 6.4. Before starting any analysis § 8.7 should be read to 
make sure that the results are not actually the incorrectly presented 
results of an experiment with related samples. The fundamental 
method of analysis for the 2x2 table (§8.2) is the randomization 
method (see § 6.3), which is known as the Fisher exact test (see § 8.2). 
There is an approximate method that gives similar results to the 
exact test with sufficiently large samples. This method can be written 
in two ways, as the normal approximation described in § 8.4 or as the 
chi-squared test described in § 8.6. The exact test (§8.2) should be 
used when the total number of observations, N. is up to 40. Published 
tables, which only deal with N < 40, make this easy. When N > 40 
the exact test should be calculated directly from (8.2.1) if the frequency 
in any cell of the table is very small. When the smallest expected value, 
x, (see § 8.6), is 6 or more there is reason to believe that the chi- 
squared test (§ 8.6) corrected for continuity will be a good approxima- 
tion to the exact test (Cochran 1962). 



Copyrighted material 



§8.2 Classification measurements 117 

8.2. Two independent samples. The randomization method and 
the Fisher test 

Randomization tests were introduced in § 6.3. As an example of the 
result of classification measurements (see § 6.4), consider the clinical 
comparison of two drugs, X and Y, on seven patients. It is funda- 
mental to any analysis that the allocation of drug X to four of the 
patients, and of Y to the other three be done in a strictly random way 
using random number tables (see §§ 2.3 and 8.3). It is noted, by a 
suitable blind method, whether each patient is improved (I) or not 
improved (N). The result is shown in Table 8.2.1 (b). 



Table 8.2.1 

Possible results of the trial. Result (b) was actually observed 





/ N 


Total 


/ N 


Total 


/ N 


Total 


/ A' 


Total 


Drug X 


4 0 


4 


3 1 


4 


2 2 


4 


1 3 


4 


Drug Y 


0 3 


3 


1 2 


3 


2 1 


3 


3 0 


3 


Total 


4 3 


7 


4 3 


7 


4 3 


7 


4 3 


7 




(a) 




(b) 




(c) 




(d) 





With drug X 75 per cent improve (3 out of 4), and with drug Y only 
33 1/3 per cent improve. Would this result be likely to occur if X and Y 
were really equi -effective ? If the drugs are equi-effective then it follows 
that whether an improvement is seen or not cannot depend on which 
drug is given. In other words, each of the patients would have given 
the same result even if he had been given the other drug, so the observed 
difference in 'percentage improved' would be merely a result of the 
particular way the random numbers in the table came up when the 
drugs were being allocated to patients. f For example, for the experiment 
in Table 8.2.1 the null hypothesis postulates that of the 7 patients, 4 
would improve and 3 would not, quite independently of which drug was 
given. If this were so, would it be reasonable to suppose that the 
random numbers came up so as to put 3 of the 4 improvers, but only 
1 of the 3 non-improvers, in the drug X group (as observed, Table 
8.2.1(b))? Or would an allocation giving a result that appeared to 

t Of course, if a subject who received treatment X during the trial were given an 
equi -effective treatment Y at a later time, the response of the second occasion would 
not be exactly the same as during the trial. But it is being postulated that if X and Y 
are equi-effective then if one of them is given to a given subject at a given moment in 
time, the response would have been exactly the same if the other had been given to the 
same subject at the same moment. 



Copyrighted material 



118 Classification measurements 



§8.2 



favour drug X by as much as (or more than) this, bo auch a rare happen- 
ing as to make one suspect the premise of equi-effectivenesa ? 

Now if the selection was really random, every possible allocation of 
drugs to patients should have been equally probable. It is therefore 
aimply a matter of counting permutations (possible allocationa) to 
find out whether it is improbable that a random allocation will come 
up that will give auch a large difference between X and Y groups as 
that observed (or a larger difference). Notice that attention is restricted 
to the actual 7 patients tested without reference to a larger population 
(see also § 8.4). Of the 7 patients, 4 improved and 3 did not. 

Three ways of arriving at the answer will be described. 

(a) Physical randomization. On four cards write 'improved' and on 
three write 'not improved'. Then rearrange the cards in random order 
using random number tables (or, less reliably, shuttle them), mimicking 
exactly the method used in the actual experiment. Call the top four 
cards drug X and the bottom three drug Y, and note whether or not 
the difference between drugs resulting from this allocation of drugs to 
patients is as large as, or larger than, that in the experiment. Repeat 
this say, 1000 times and count the proportion of randomizationa that 
result in a difference between drugs as large aa or larger than that in the 
experiment. This proportion is P, the reault of the (one-tail) significance 
test. If it is amall it means that the observed reault is unlikely to have 
arisen solely because of the random allocation that happened to come 
up in the real experiment, so the premise (null hypothesis) that the 
drugs are equi -effective may have to be abandoned (see § 6.2). This 
method would be tedious by hand, though not on a computer, but 
fortunately there are easier waya of reaching the same results. The 
two-tail test is discussed below. 

(b) Counting permutations. Aa each possible allocation of drugs to 
patients is equally probable, if the randomization was properly done, 
the results of the procedure just described can be predicted in much 
the same way that the results of coin tossing were predicted in § 3.2. 
If the seven patienta are diatinguiahed by numbers, the four who 
improve can be numbered 1, 2, 3, and 4, and those who do not can be 
numbered 5, 6, and 7. According to the null hypothesia each patient 
would have given the same response whichever drug had been given. 
How many way can the 7 be divided into groups of 3 and 4? The 
anawer is given, by (3.4.2), aa 7 1/(4 13 !) = 35 waya. It ia not necessary 
to write out about both groups since once the number improved has 
been found in one group (aay the smaller group, drug Y, for convenience), 



Copyrighted material 



§8.2 Classification measurements 110 

Table 8.2.2 

Enumeration of all 35 possible toays of selecting a group of 3 patients 
from 7 to be given drug 7. Patients 1, 2, 3, and 4 improved and patients 
5, 6, and 7 did not. Number of subjects improving with 7 = 6 ( 

Table 8.2.3(a)) 



Patient* given 
drug Y 



6 6 7 



Patient* given 
drug T 



Re«ult 



6 = 0 
1 way giving 
Table 8.2.1(a). 
P - 1/36 - 0 029 



12 6 
12 6 
12 7 



1 


6 


6 




1 


3 


5 




1 


6 


7 




1 


3 


6 




1 


6 


7 




1 


3 


7 


6 » 2 Improve. 
















18 waya all 


2 


6 


6 




1 


4 


6 


giving 


2 


6 


7 


6 ss l improve. 


1 


4 


6 


Table 8.2.1(c). 


2 


6 


7 


12 wave all 


1 


4 


7 


P - 18/36 - 0-614 








giving 










3 


6 


6 


Table 8.2.1(b). 


2 


3 


6 




3 


5 


7 


P = 12/36 » 0-343 


2 


3 


6 




3 


6 


7 


2 


3 


7 




4 


6 


8 




2 


4 


5 




4 


6 


7 




2 


4 


6 




4 


6 


7 




2 


4 


7 












3 


4 


5 












3 


4 


6 












3 


4 


7 





12 3 b = 3 improve. 

1 2 4 4 waya all 

13 4 giving 

2 3 4 Table 8.2.1(d). 

P - 4/35 - 0114 



the number improved in the other group follows from the fact that the 
total number improved is necessarily 3. All 36 ways in which the drug Y 
group could have been constituted are listed systematically in Table 
8.2.2. If the randomization was done properly each way should have 
had an equal chance of being used in the experiment. Notice that 
proper randomization in conducting the experiment is crucial for the 



Copyrighted material 



120 Classification measurements 



§8.2 



analysis of the results. It is seen that 12 out of the 35 result in one 
improved, two not improved in the drug Y group, as was actually 
observed. Furthermore, 1 out of 35 shows an even more extreme 
result, no patient at all improving on drug Y group, as shown in 
Table 8.2.1(a). 

Thus P = 12/35+1/35 = 0-343+0-029 = 0-372 for a one-tail testf 
(see § 6.1). This is the probability (the long-run proportion of repeated 
experiments) that a random allocation of drugs to patients would 
be picked that would give the results in Table 8.2.1(a) or 8.2.1(b), i.e. 
that would give results in which X would appear as superior to Y as in 
the actual experiment (Table 8.2.1(b)), or even more superior (Table 
8.2.1(a)), if X and Y were, in fact, equi-effective. This probability is 
not low enough to suggest that X is really better than Y. Usually a 
two-tail test will be more appropriate than this one-tail test, and this is 
discussed below. 

Using the results in Table 8.2.2, the sampling distribution under the 
null hypothesis, which was assumed in constructing Table 8.2.2, is 
plotted in Fig. 8.2.1. This is the form of Fig. 6.1.1 that it is appropriate 
to consider when using the randomization approach. The variable on the 
abscissa is the number of patients improved on drug Y, i.e. b in the 
notation of Table 8.2.3(a). Given this figure the rest of the table can 
be filled in, using the marginal totals, so each value of b corresponds to a 
particular difference in percentage improvement between drugs X and 
Y. Fig. 8.2.1 is described as the randomization (or permutation) distribu- 
tion of b, and hence of the difference between samples, given the null 
hypothesis. The result of a one-tail test of significance when the 
experimentally observed value is b = 1 (Table 8.2.1(b)), is the shaded 
area (as explained in § 6.1), i.e. P = 0-372 as calculated above. 

The two-tail test. Suppose now that the result in Table 8.2.1(a) 
had been found in the experiment (6 = 0). A one-tail test would give 
P = 1/35 = 0-029, and this is low enough for the premise of equi- 
effectiveness of the drugs to be suspect if it is known beforehand that 
Y cannot possibly be better than X (the opposite result to that observed). 
As this is usually not known a two-tail test is needed (see § 6.1). How- 
ever, the most extreme result in favour of drug Y (6 = 3 as in Table 

t This is s one-toil teat or the null hypothesis that X and Y are equi-effective, when 
the alternative hypothesis is that X is better than Y. If the alternative to the null 
hypotheeig had been that Y was better than X (the alternative hypothesis must, of 
course, be chosen before the experiment) then the one-tail P would have been 12/35 
+ 18/35 + 4/35 = 0 971, the probability of result as favourable to Y as that observed, 
or more favourable, when X and Y are really equi-effective. 



Copyrighted material 



§8.2 



Classification measurements 121 



8.2.1(d)) is seen to have P = 0*114. It is therefore impossible that a 
verdict in favour of drug Y could have been obtained with these 
patients. // the drugs were really equi -effective then, if the hypothesis 
of equi-efFectiveness were rejected every time 6 = 0 or 6=3 (the 
two most extreme results), it would be (wrongly) rejected in 2-9+11-4 
= 14-3 per cent of trials — far too high a level for the probability of an 
error of the first kind (see § 6.1, para. 7). A two-tail test is therefore 
not possible with such a small sample. This difficulty, which can only 
occur with very small samples— it does not happen in the next example, 
has been discussed in a footnote in § 6.1 (p. 89). 



(hi - 

0-4- 
0 343 — 
0 3 - 

0-2- 

0-1 - 
0 029 - 



Flo. 8.2.1. Randomization distribution of b (the number of patients improv- 
ing on drug Y), when X and Y are equi -effective (i.e. null hypotheeia true). 

(c) Direct calculation. The Fisher test. It would not be feasible to 
write out all permutations for larger samples. For two samples of ten 
there are 20!/(10!10!) = 184756 permutations. Fortunately it is not 
necessary. If a general 2x2 table is symbolized as in Table 8.2.3(a) 



Table 8.2.3 





success 


failure 


total 


success 


failure 


total 


treatment 


X 


a 


A -a 


A 


8 


7 


15 


treatment 


Y 


b 


B-b 


B 


1 


11 


12 


total 


c 


D 


N 


9 


18 


27 



(a) (b) 



Copyrighted material 



122 Classification measurements 



§8.2 



then Fisher has shown that the proportion of permutations giving rise 
to the table is 

A\B\C\D\ 

P = N\a\{A-a)\b\(B-b)\ (8,21) 

For example, for Table 8.2.1(b), P = 4!3!4!3!/(7I3!l !1 !2!) = 12/35 
= 0'343 as already found. With larger figures (8.2.1) is most con- 
veniently evaluated using tables of logarithms of factorials (e.g. Fisher 
and Yates, 1963). 

In fact no calculation at all is necessary as tables have been published 
(Finney, Latscha, Bennett, and Hsu, 1963) for testing any 2x2 table 
with A and B, or C and D, both not more than 40. Unfortunately, to 
keep the tables a reasonable size it is not possible to find the exact P 
value for all 2 x 2 tables, but it is given for those 2x2 tables with 
marginal totals up to 30 for which P < 0 05 (one tail). The published 
tables fvro for B .<r A and b < a only, to avoid duplication. If the 
table to be tested does not comply with this, rows and/or columns 
must be interchanged until it does. As an example, the table in Table 
8.2.3(b), which is from the introduction to the table of Finney et al. 
(1963), is tested using the appropriate part of their table, which has been 
reproduced in Table 8.2.4. 

Table 8.2.4 

Exact test for the 2x2 table (Extract from tables of Finney et al. (1963)) 

Probability (nominal) 
a 0 06 0 025 0 01 0 006 



16 B = 12 



16 
H 

13 



8 0-028 
7 0-043 

0 (HMO 



7 o-oio- 

6 0018 



6 o-oio 



7 0-010- 
6 0 008 
4 0 007 



6 (KXtt 
4 0-002 
3 0 002 



9 

8 
7 



2 0 028 



1 0-018 



1 0 08* 



1 0 007 
1 0018 
0 0 007 



1 0007 
0 0003 
0 0 007 



0 0-001 



0 0-003 



The observed Table 8.2.3(b) has A = 16, B = 12, and a = 8. 
Entering Table 8.2.4 with these values shows under each nominal 



Copyrighted material 



Classification measurements 123 



probability a figure in bold type which is the largest value of 6 that is just 
'significant in a one-tail test at the 5 per cent (or 2*5, 1, or 0-6 per cent) 
level', i.e. for whioh the one-tail^ P < 0 05, (or 0 025, 0-01, or 0-005). 
The exact value of P is given in smaller type. It is the nearest value, 
given that 6 must be a whole number, that is not greater than the 
nominal value. In this example the one-tail P corresponding to the 
observed 6 = 1 is 0 0 18. This is the sum of the P values calculated 
from (8.2.1) for the observed table (a = 8, b = 1, P = 0 01 7), and the 
only possible more extreme one with the same marginal totals (a = 9, 
b = 0, P = 0 001). To find the two-tail P value (see § 6.1 and above) 
consider the distribution of 6 analogous to Fig. 8.2.1. In this case 6 
can vary from 0 to 9 and if the null hypothesis were true it would be 
4 on the average (see § 8.5). The one-tail P found is the tail of the 
distribution for b < 1. It is required to cut off an area as near as 
possibletothis in the other tail of the distribution (6>4),as in Fig. 6.1.1. 
No value of b outs off exactly P = 0-018 but b = 7 cuts off an area of 
P =» 0-019 that is near enough (see footnote, § 6.1, p. 89). This is the 
sum of the probabilities of b = 7 and all the more extreme (6 = 8 and 
6 = 9) results. It can be found from the tables of Finney et cU. by the 
method described in their introduction. The table has a = 2, 6 = 7, 
-4— a = 13, 5—6 = 5 so columns are interchanged, as mentioned 
above, and the table entered with 13 and 5 rather than 2 and 7, as 
marked in Table 8.2.4. Therefore if it were resolved to reject the null 
hypothesis whenever 6 < 1 (as observed) or when 6 > 7 (opposite tail) 
then, if the null hypothesis was in fact true, the probability that it 
would be rejected (wrongly)— an error of the first kind— would be 
P = 0-018+0 019 = 0-037. This result for the two-tail test is small 
enough to make one question the null hypothesis, i.e. to suspect a 
real difference between the treatments, (see § 6.1). 

In practice, if the samples are not too small, it would be adequate, 
and much simpler, to double the one-tail P from the table to get the 
required two-tail P. 

8.3. The problem of unecceptable randomizations 

Sometimes it will be found that when two samples are selected at 
random one sample contains, for example, all the men and the other 
all the women. In fact if this does not happen sometimes, the selection 
cannot be random. It seems silly to carry out an experiment in which 

f For the case when it is decided, before the experiment, that the only alternative to 
the null hypothesis is a difference between X and Y in the observed direction. 



Copyrighted material 



124 Classification measurements 



§8.3 



treatment X is given only to men and treatment Y only to women. 
Yet the logical basis of significance tests will be destroyed if the 
experimenter rejects randomizations producing results he does not 
like. Often this will be preferable to the alternative of doing an experi- 
ment that is, on scientific grounds, silly. But it should be realized that 
the choice must be made. 

There is a way round the problem if randomization tests are used. 
If it is decided beforehand that any randomization that produces two 
samples differing to more than a specified extent in sex composition — 
or weight, age, prognosis, or any other criterion — is unacceptable then, 
if such a randomization comes up when the experiment is being 
designed, it can be legitimately rejected */, in the analysis of the results, 
those of the possible randomizations that differ excessively, according to 
the previously specified criteria, are also rejected, in exactly the same 
way as when the real experiment was done. So, in the case of Table 
8.2.2, the number of possible allocations of drugs to the 7 patients 
could be reduced to less than 35. This can only be done when using 
the method of physical randomization, or a computer simulation 
of this process, or writing out the permutations as in Table 8.2.2. 
The shorter methods using calculation (e.g. from (8.2.1), or published 
tables (e.g. for the Fisher exact test, § 8.2, or the Wilcoxon teste, 
§ § 9.3 and 10.4), cannot be modified to allow for rejection of randomiza- 
tions. 

8.4. Two independent samples. Use of the normal approximation 

Although the reasoning in § 8.2 is perfectly logical, and although 
there is a great deal to be said for restricting attention to the observa- 
tions actually made since it is usually impossible to ensure that any 
further observations will come from the same population (see §§1.1 
and 7.2), the exact test has nevertheless given rise to some controversy 
among statisticians. It is possible to look at the problem differently. 
If, in the example in Table 8.2.1, the 7 patients were thought of as 
being selected from a larger population of patients then another sample 
of 7 would not, in general, contain 4 who improved and 3 who did not. 
This is considered explicitly in the approximate method described in 
this section. However there is reason to suppose that the exact test 
of § 8.2 is best, even for 2x2 tables in which the marginal totals are 
not fixed (Kendall and Stuart 1961, p. 554). 

Consider Table 8.2.3 again but this time imagine two infinite popula- 
tions (e.g. X-treated and Y-treated) with true probabilities of success 



Copyrighted material 



§8.4 



Classification measurements 125 



(e.g. improved) ^ and & 2 respectively. From the first population a 
sample of A individuals is drawn at random and is observed to contain a 
successes (e.g. improved patients). Similarly b successes out of B are 
observed in the sample from the second population. The experimental 
estimates of ^ l and & 2 are, as in § 3.4, p 1 = aj A and p 2 = b/B, the 
observed proportions of successes in the samples from the two popula- 
tions. In repeated trials o and b should vary as predicted by the binomial 
distribution (see § 3.4). 

Use of the normal approximation to the binomial 

It is required to test the null hypothesis that ^ = & 2 , both being 
say. If this were so then on the average the observed proportions 
would be the same too, so p^—p 2 would be distributed about a mean 
value of zero (cf. Fig. 6.1.1). It was mentioned in § 3.4 and illustrated 
in Fig. 3.4.1) that if n is reasonably large the discontinuous binomial 
distribution of p is quite well approximated by a continuous normal 
distribution. It will therefore be supposed, as an approximation, that 
Pi and p 2 are both normally distributed. This implies that the difference 
between them (Pi— p 2 ) will be normally distributed with, according to 
the null hypothesis, a population mean (/i) of zero. The standard 
deviation of this distribution can now be found by using (3.4.5) to 
find the true variances of p l and p 2 which, given the null hypothesis, 
are 

.9(1 -P) 

vat(p l ) = — vat{p 2 ) = (8.4.1) 

If p x and p 2 are independent, as they will be if the samples are indepen- 
dent as assumed, (cf. § 6.4), the variance of their difference will be, 
using (2.7.3), 



The true value, is, of course, unknown, and it must be estimated 
from the experimental results. No allowance is made for this, which is 
another reason why the method is only approximate. The natural 
estimate of ^, under the null hypothesis, is to pool the two samples 
and divide the total number of successes by the total number of trials 
(e.g. total number improved by total number of patients), i.e. 
p = (a+b)/(A+B). Thus, taking x = {Pi—p 2 ) as the normal variable, 
with, according to the null hypothesis, a = 0, an approximate normal 



*>"'(Pi-P2) = v**{Px) + *>at(p 2 ) = ^(1-^)1-+ -j (g 4 2 ) 




Copyrighted material 



126 Classification measurements §8.4 

deviate (see §4.3) can be calculated, using (4.3.1) and (8.4.2). This 
value of u can then be referred to tables of the standard normal distri- 
bution (see § 4.3). 

_ (Pi-ffa) 
U o(x)-^[p{\-p){llA+llB) (8 ' 4 ' 3) 

Applying this method to the results in Table 8.2.3 gives p x = a/A 
= 8/16, p 2 = bjB = 1/12, p = {a+b)l{A+B) = 9/27 and so, using 
(8.4.3), the approximate normal deviate is 

8/15-1/12 n n 

u~ = 2-4648. 



VtK'-^Xii+i)] 



According to Table 1 of the Biometrika tables | about 1*4 per cent of the 
area of the standard normal distribution lies outside t*± 2*4648 
(0*7 per cent in each tail). The result of the test, P = 0*014, is seen to 
be a poor approximation to the exact result, P = 0*037, found at the 
end of § 8.2. A better approximation can be found by using a 'correction 
for continuity' and this should always be done. 

Yates' correction for continuity 

Say p = rjn in general. It can be shown (e.g. Brownlee (1965, pp. 
139, 152)) that the approximation of the continuous normal distribution 
to the discontinuous binomial is improved if 0*5 is added to or sub- 
tracted from r (or 0-5/n is added to or subtracted from p), so as to 
make the deviation from the null hypothesis smaller. Thus a better 
approximation than (8.4.3) for the normal deviate is 

(y 1 -0*5/^)-(p a 4-0-5/B) 

vW-J>)(i/A+i/*)] * {SAA) 

where p 1 > p^. Using the results in Table 8.3 again, gives u = 2*054. 
Again using Table 1 of the Biometrika tables it is found that 4*0 per 
cent of the total area of the standard normal distribution lies outside 
u = ±2*054 as shown in Fig. 8.4.1 (cf. Fig. 6.1.1). In other words, in 
repeated experiments it would be expected, if the null hypothesis were 
true, that in 2*0 p e r unit of experiments u would be less than —2*054, 

percent 

f This table actually gives the area below u = +2-468, i.e. 1-0-007 = 0-993. 
See §4.8 for details. 



§8.4 



Classification measurements 127 



and in 2-0 per cent u would be greater than +2-054. This is a two-tail 
test (see §6.1). 

The result of the test. The probability of observing a difference (positive or 
negative) in success rate between the sample from population 1 (J- treated) and 
that from population 2 ( F- treated ) as large as, or larger than, the observed 
sample difference, if there were no real difference between the treatments 
(populations), would be approximately 0.04, a 1 in 25 chance. 

The corrected result, P — 0-04, is quite a good approximation to 
the exaot probability, P = 0 037, found at the end of § 8.2. It is low 




Standard normal deviate, v 



Fio. 8.4.1. Normal approximation to the binomial. Difference between two 
binomial proportions is converted to an approximate normal deviate, u, and 
referred to the standard Gaussian curve shown in the figure. 

enough to make one suspect (without very great confidence) a real 
difference between the treatments. 

Nonparametric nature of test. Although the normal distribution is 
used, the test just described is still essentially a nonparametric test. 
This is because the fundamental assumption is that the proportion of 
successes is binomially distributed and this can be assured by proper 
sampling methods. The normal distribution is used only as a mathe- 
matical approximation to the binomial distribution. 

8.6. The chi-squared (x 3 ) test. Classification measurements 
with two or more independent samples 

The probability distribution followed by the sum of squares of n independent 
standard normal variatee (i.e. 2(x ( — /i,) 3 /© 3 ) where x t is normally distributed with a 
mean of u, and a standard deviation of a t , see § 4.8), is called the chi-squared 

10 



Copyrighted material 



128 Classification measurements 



§8.5 



distribution with/degree of freedom, denoted x J < n - As suggested by the definition, 
the scatter seen in estimates of the population variance, calculated from repeated 
samples from a normally distributed population, follows the % 2 distribution. In 
fact x 2 = where a 3 is an estimate of a 2 issued on / degrees of freedom. The 

consequent use of z* for testing hypotheses about the true variance of such a 
population is described, for example, by Brownlee (1965, p. 282). 

In the special case of/ = 1 d.f., one has x<?) = « 3 , the square of a 
single standard normal variate. Tables of the distribution of chi 
squared with one degree of freedom can therefore be used, by squaring 
the values of u found in § 8.4, as an approximate test for the 2x2 
table. In practice xd) is not usually calculated by the method given 
for the calculation of u, but by another method which, although it 
does not look related at first sight, gives exactly the same answer, as 
will be seen. The conventional method of calculation to be described 
has the advantage that it can easily be extended to larger tables of 
classification measurements than 2 x 2. An example is given below. 

The form in which x 2 is most commonly encountered is that appro- 
priate for testing (approximately) goodness of fit, and tables of 
classification measurements (contingency tables). If x 0 is an observed 
frequency and x t is the expected value of the frequency on some 
hypothesis, then it can be shown that the quantity 



which measure the discrepancy between observation and hypothesis, 
is distributed approximately like y 2 . This approach will be used to test 
the 2x2 table (Table 8.2.3) that has already been analysed in §§ 8.2 
and 8.4. 

The expected values, x„ of the frequencies, given the null hypothesis 
that the proportion of successes is the same in both populations, are 
calculated as follows. The best estimate of this proportion of successes 
is, as found in § 8.4, p = (a+6)/(,4 + 5) = 9/27 = 0-3331 Therefore, 
if the null hypothesis were true, the best estimate of the numberf of 
successes in the sample from population 1 (e.g. number of patients 
improved on drug X) would be 0-3333 X lfi = 5, and similarly the 
expected number for population 2 (e.g. drug Y) would be 0-333$ x 12 

f This need not be a whole number (see Table 8.5(b) for example). It is a predicted 
long-run average frequency. The individual frequencies must, of course, be integers. 




Copyrighted material 



§8.5 



Classification 



129 



= 4. The original table of observations, and the table of values expected 
on the null hypothesis are thus : 



(*•) 



(*.) 





success failure 


total 


success failure 


total 


Population 1 


8 


7 


15 


6 


10 


16 


Population 2 


1 


11 


12 


4 


8 


12 


Total 


0 


18 


27 


0 


18 


27 



The summation in (8.5.1) is over all of the cells of the table. The 
differences {x 0 — x t ) are 8—5 = 3, 1—4=— 3, 7— 10 = — 3, and 
11—8 = 3. Thus, from (8.5.1), 

32 (_ 3 )2 ( _ 3) 2 3 a 



Z a = T + 



+ 



10 ' 8 



f — = 6-075. 



This is a value of x 2 with one degree of freedom, because only one 
expected value need be calculated, the rest following by difference 
from the marginal totals. It is, as expected, exactly the square of the 
value of u found in § 8.4, 2-4648 a = 6 075. 



Correction for continuity 

As in § 8.4, this approximate test for association in a 2x2 table 
should not be applied without using the correction for continuity. 
Simply reduce the absolute values of the deviations by 0-5 giving 

2& (-2-5)2 (-2-5) 2 2-5 



10 



8 



Again it is seen that this is exactly the square of the (corrected) value of 
u found in § 8-4, u 2 - 2-054 2 = 4-219 = * a . This can be referred 
directly to a table (e.g. Fisher and Yates, 1963, Table IV; or Pearson 
and Hartley, 1966, Table 8) of the chi-squared distribution which, 
for one degree of freedom, has the appearance shown in Fig. 8.5.1. 
It is found that 4-0 per cent of the area under the curve lies above 
X 2 = 4*219 (because of the way the tables are constructed the most 
accurate value that can be found from them is that the area is a little 
less than 5-0 per cent, i.e. 0-05>P> 0-025). This is exactly the same 
as found in § 8.4 as it should be, since the z 2 test for the 2x2 table is just 
another way of writing the test using the normal approximation to the 
binomial. The result of the test states that if the null 



Copyrighted material 



130 Classification measurements § 8.5 

true, a value of jrJ, as large as 4-219 or larger would be found in 
4-0 per cent of repeated experiments in the long run. This casts a 
certain amount of suspicion on the null hypothesis as explained in 
§8.4. 

It should be noticed that the probability found using is that 
appropriate for a two-tail test of significance (as shown in § 8.4, 




Fio. 8.5.1. The distribution of chi-squared. The observed value, 4-219, for 
chi-squared with one degree of freedom (see text) would be exceeded in only 4 
per cent of repeated experiments in the long run if the null hypothesis were true. 
The distribution for 4 degrees of freedom is also shown. (See Chapter 4 for 

explanation of probability density.) 



Fig. 8.4.1, cf. § 6.1) in spite of the fact that only one tail of the x 1 
distribution is considered in Fig. 8.5.1. This is because % 2 involves the 
squares of deviations, so deviations from the expected values in either 
direction increase y 2 in the same direction. 



Copyrighted material 



§ 8.6 Classification measurements 131 

Use of chi- squared for testing association in tables of classification measure- 
ments larger than 2x2 

If the results of treatments X and Y had been classified in more than 
two ways, for example success, no change, or failure, the experiment 
shown in Table 8.2.3(b) might have turned out as in Table 8.6.1(a). 





Table 8.6.1 
no 

Buccees change failure 




Treatment X 
Treatment Y 


8 3 4 
1 6 6 


16 
12 




9 8 10 
no 

success change failure 


27 


Treatment X 
Treatment Y 


6 4-4 6-d 
4 8-6 4-4 


16 
12 




0 8 10 


27 



(a) 



(b) expected on 
null hypothesis 



A proper randomization analysis oould be done similar to that in 
§ 8.2, but no tables exist to shorten the calculations for tables larger 
than 2x2. Often two or more columns or rows can be pooled (giving, 
for example, Table 8.2.3(b) again) to give 2x2 tables, which may 
answer the relevant questions. For example, is the proportion of 
success and of [no change or failure] the same for X and Y? This 
question is answered by the test of Table 8.2.3(b). 

Table 8.5.1(a) itself can be tested using the x 2 approximation, 
which is quite sufficiently accurate if all the expected frequencies are 
at least 5. (They are not in this case; the test is not really safe with 
such small numbers.) On the null hypothesis that the proportion of 
successes is the same from treatments X and Y this proportion would 
be estimated as 9/27 = 0-3333. So the number of successes expected on 
the null hypothesis, when 15 individuals are treated with X, is 0*3333 
X 16 = 6. Proceeding similarly for *no change' and 'failure* gives 
Table 8.5.1(b). Thus, from (8.5.1), 

(8-5) a (3-4.4)* (6-4-4) a 

Xm = g~ +— 4 4 + -+ -TT ' = 6 '° 86 ' 



Copyrighted material 



132 Classification measurements 



§ 8.5 



Note that no correction for continuity is used for tables larger than 
2x2.^ has two degrees of freedom since only two cells can be filled 
in Table 8.5.1(b), the rest then follow from the marginal totals. Consult- 
ing a table of the f distribution (e.g. Fisher and Yates 1963, Table 
IV) shows that a value of (with 2 d.f.) equal to or larger than 6-086 
would occur in slightly less than 5 per cent of trials in the long run, if 
the null hypothesis were true; i.e. for a two-tail test 0-025 < P < 0-05. 
This is small enough to cast some suspicion on the null hypothesis. 

Independence of classifications (e.g. of treatment type and success 
rate) is tested in larger tables in an exactly analogous way, x 1 being the 
sum of rk terms, and having (r— l){k — 1) degrees of freedom, for a 
table with r rows and k columns. 

8.6. One sample of observations. Testing goodness of fit with 
chi-squared 

In examples in §§ 8.2-8.5 two (r in general) samples (treatments) 
were considered. Each sample was classified into two or more {k in 
general) ways. The chi-squared approximation is the most convenient 
method of testing a single sample of classification measurements, to 
see whether or not it is reasonable to suppose that the number of 
subjects (or objects, or responses) that are observed to fall in each of the 
k classes is consistent with some hypothetical allocation into classes 
that one is interested in. 

For example, suppose that it were wished to investigate the (null) 
hypothesis that a die was unbiased. If it were tossed say 600 times the 
expected frequencies, on the null hypothesis, of observing 1, 2, 3, 4, 5, and 6 
(the k = 6 classes) would all be 100, so x t is taken as 100 each class in 
calculating the value of eqn. (8.5.1). The observed frequencies are the 
x 0 values. The value of eqn. (8.5.1) would have, approximately, the 
chi-squared distribution with k — 1 = 5 degrees of freedom if the null 
hypothesis were true, so the probability of finding a discrepancy be- 
tween observation and expectation at least as large as that observed 
could be found from tables of the chi-squared distribution as above. 
(See also numerical example below.) 

As another example suppose that it were wished to investigate the 
(null) hypothesis that all students in a particular college are equally 
likely to have smoked, whatever their subject. Again the null hypothesis 
specifies the number of subjects expected to fall into each of the k 
classes (physics, medicine, law, etc.). If there are 500 smokers altogether 
the observed numbers in physics, medicine, law, etc. are the x 9 values 



Copyrighted material 



§ 8.6 



Classification measurements 133 



in eqn. (8.6.1). The expected numbers, on the null hypothesis, are 
found much as above. The example is more complicated than in the 
case of tossing a die, because different numbers of students will be 
studying each subject and this must obviously be allowed for. The total 
number of smokers divided by the total number of students in the 
college gives the proportion of smokers that would be expected in each 
class is the null hypothesis were true, so multiplying this proportion 
by the number of people in each class (the number of physics students 
in the college, etc.) gives the expected frequencies, x e , for each class. 
The value calculated from (8.5.1) can be referred to tables of the chi- 
squared distribution with Jfe— 1 degrees of freedom, as before. 

A numerical example. Goodness of fit of the Poisson distribution 

The chi-squared approximation, it has been stated, can be used to 
test whether the frequency of observations in each class differ by an 
unreasonable amount from the frequencies that would be expected if 
the observations followed some theoretical distribution such as the 
binomial, Poisson, or Gaussian distributions. In the examples just 
mentioned, the theoretical distribution was the rectangular ('equally 
likely') distribution, and only the total number of observations (e.g. 
number of smokers) was needed to find the expected frequencies. In 
§ 3.6 the question was raised of whether or not it was reasonable to 
believe that the distribution of red blood cells in the haemocytometer 
was a Poisson distribution, and this introduces a complication. The 
determination of the expected frequencies, described in § 3.6, needed 
not only the total number of observations (80, see Table 3.6.1), but 
also the observed mean f = 6*625 which was used as an estimate of m 
in calculating the frequencies expected if the hypothesis of a Poisson 
distribution were true. The fact that an arbitrary parameter estimated 
from the observations (r = 6-625) was used in finding the expected 
frequencies gives them a better chance of fitting the observations than 
if they were calculated without using any information from the observa- 
tions themselves, and it can be shown that this means that the number of 
degrees of freedom must be reduced by one, so in this sort of test chi- 
squared has k— 2 degrees of freedom rather than k—l. 

Categories are pooled as shown in Table 3.6.1 to make all calculated 
frequencies at least 5, because this is a condition (mentioned above) 
for to be a good approximation. Taking the observed frequency as 
x 0 and the calculated frequency (that expected on the hypothesis that 



Copyrighted material 



134 Classification measurements §8.6 

cells are Poisson distributed as ar e gives, using (8.5.1) with the results 
in Table 3.6.1, 

(4-8)» (5-9)3 (3-5)3 (0 _ 5)a 

f = — — +- —+... + - j- + - — = 14-7. 

The number of degrees of freedom, in this case, is the number of 
classes (k = 9 after pooling) minus two, as mentioned above. There 
are therefore 7 degrees of freedom. looking up tables of the jj* distribu- 
tion shows that Pfof) > 14-7] ~ 0-05. This means that if the true 
distribution of red cells were a Poisson distribution, then an apparent 
deviation from the calculated distribution (measured by x 2 ) as large as, 
or larger than, that observed in the present experiment would be 
expected to arise by random sampling error in only about 6 per cent of 
experiments. This is not low enough (see § 6.1) for one to feel sure that 
the premise that the distribution is Poissonian must be wrong, though 
it is low enough to suggest that further experiments might lead to that 
conclusion. 

8.7. Related samples of classification measurements. Cross-over 
trials 

Consider Table 8.7.1(a), which is based on an example discussed by 
Mainland (1963, p. 236). It looks just like the 2x2 tables previously 
presented. In fact it is not, because it was not based on two independent 
samples of 12. There were actually only 12 patients and not 24. Eaoh 
patient was given both X and Y (in a random order). This is described 
as a cross-over trial because those (randomly chosen) patients who 
were given X first (period 1) were subsequently (period 2) given Y, and 
vice versa. Table 8.7.1(a) is an incorrect presentation of the results 
because it disguises this fact. Table 8.7.1(b) is a correct way of giving 
the results, and it contains more information since 8.7.1(a) can be 
constructed from it, whereas 8.7.1(b) cannot be constructed from 
8.7.1(a). Table 8.7.1(b) cannot be tested in the way described for 
independent samples either. The 5 patients who reacted in the same 
way to both X and Y contribute no information about the difference 
between the drugs, only the 7 who reacted differently to X and Y do so. 
Furthermore, the possibility that the result depends on whether X or 
Y was given first can be taken into account. The correct method of 
analysis is described clearly by Mainland (1963 p. 236). The full results, 
which have been condensed into 8.7.1(b), were as given in Table 
8.7.2. Note that of the 12 patients half (6 selected at random from the 



Copyrighted material 



§8.7 


Classification measurements 135 


12) have been assigned to the XY sequence, the other half to the 
YX sequence. These results can be arranged in a 2 x 2 table, 8.7.3(a) 
consisting of two independent samples. A randomization (exact) test 
or x 2 approximation applied to this table will test the null hypothesis 




Table 8.7.1 






not 

improved improved 
(I) (N) 




Drug X 
Drug Y 


12 0 
S 7 


12 
12 




17 7 


24 8.7.1(a) 




Drug Y 






I N 




Drug xj 


5 7 
0 0 


12 
0 




6 7 


12 8.7.1(b) 




Table 8.7.2 




P»tlenU showing X in period (1) X in period (2) Totals 
Improvement* 


In both periods 

In period (1) 

not in (2) 

In period (2) 

not in (1) 

In neither period 


3 
3 

o 
0 


2 6 
0 3 

4 4 

0 0 



6 6 12 



that the proportion improving in the first period iB the same whether 
X or Y was given in the first period, i.e. that the drugs are equi-effeotive. 
The test has been described in detail in § 8.2 (the table is the same as 
Table 8.2.1(a)), where it was found that P (one tail) — 0 029 but that 
the sample is too small for a satisfactory two-tail test, which is what is 
needed. In real life a larger sample would have to be used. 



Copyrighted material 



136 Classificatio7i measurements 



The subjects showing the same response in both periods give no 
information about the difference between drugs but they do give 
information about whether the order of administration matters. 
Table 8.7.3(b) can be used to test (with the exact test or chi squared) 
the null hypothesis that the proportion of patients giving the same 
result in both periods does not depend on whether X or Y was given 
first. Clearly there is no evidence against this null hypothesis. If, for 

Table 8.7.3 





Improved in 


Improved in 






(1) not (2) 


(2) not (1) 




X in period (1) 


3 


0 


3 


X in period (2) 


0 


4 


4 




3 


4 


7 



8.7.3(a) 



Outcome in periods (1) and (2) 





same 


different 




X in period (1) 


3 


3 


e 


X in period (2) 


2 


4 


6 




6 


7 


12 



8.7.3(b) 



example, drug X had a very long-lasting effect, it would have been 
found that those patients given X first tended to give the same result 
in both periods because they would still be under its influence during 
period 2. 

If the possibility that the order of administration affects the results 
is ignored then the use of the sign test (see § 10.2) shows that the 
probability of observing 7 patients out of 7 improving on drug X (on 
the null hypothesis that if the drugs are equi-effective there is a 
50 per cent chance, & = 1/2, of a patient improving) is simply P 
= (1/2) 7 = 1/128. For a two-tail test (including the possibility of 7 out 
of 7 not improving) P = 2/128 = 0-016, a far more optimistic result 
than found above. 



Copyrighted material 



9. Numerical and rank measurements. 
Two independent samples 



'HeiBe Magiater, heiBe Doktor gar, 
Und ziehe achon an die zehen Jahr 
Herauf, berab und quer und krumm 
Meine Scbttler an der Nase herum — 
Und aehe, dafl wir nichta wiusen kdnnen I 
Das will mir schier das Herae verbrennen.'t 



t They call me matter, even doctor, and for some ten yean now I've led my students 
by the noae, up and down, and around and in circle* — and all I aee is that we cannot 
know ! It nearly breaks by heart.' 

Goethe 
[FauM, Part 1, line 360) 



9.1. Relationship between various methods 

In §9.2 the randomization test (see §§6.3 and 8.2) is applied to 
numerical observations. In § 9.3 the Wilcoxon, or Mann- Whitney, 
test is described. This is a randomization test applied to the ranks 
rather than the original observations. This has the advantage that 
tables can be constructed to simplify calculations. These randomization 
methods have the advantage of not assuming a normal distribution ; 
also they can cope with the rejection of particular allocations of treat- 
ments to individuals that the experimenter finds unacceptable, as 
described in § 8.3. They also emphasize the absolute necessity for 
random selection of samples in the experiment if any analysis is to be 
done. For large samples Student's t test, described in § 9.4, can be used, 
though how large is large enough is always in doubt (see § 6.2). At 
least four observations are needed in each sample, however large the 
differences, unless the observations are known to be normally distri- 
buted, as discussed in § 6.2. 

9.2. Randomization test applied to numerical measurements 

The principle involved is just the same as in the case of classification 
measurements and § 8.2 should be read before this section, as the 
arguments will not all be repeated. 

Suppose that 4 patients are treated with drug A and 3 with drug B 



Copyrighted material 



138 Numerical and rank measurements 



§9.2 



as in § 8.2, but, instead of each being classified as improved or not 
improved, a numerical measurement is made on each. For example, 
the reduction of blood glucose concentration (mg/100 ml) following 
treatment might be measured. Suppose the results were as in Table 
9.2.1. 

The numbering of the patients is arbitrary but notice that if a positive 
response is counted as an 'improved' and negative as 'not improved', 
Table 9.2.1 is the same as Table 8.2.1(b) so, if the size of the improve- 
ment is ignored, the results can be analysed exactly as in § 8.2. 

However, with suoh a small sample it is easy to do the randomiza- 
tion test on the measurements themselves. The argument is as in 

Table 9.2.1 

Responses {glucose concentration, mgj 100 ml) to two drugs. The ranks of 





the responses at 


re given for 


use in § 9.3 






Drag A 






Drug B 




Patient 


Response 




Patient 


Response 




number 


(mg/lOOml) 


Rank 


number 


(mg/ 100ml) 


Rank Total 


1 


10 


5 


4 


5 


4 


2 


16 


6 


6 




2 


3 


20 


7 


7 


-6 


1 


6 


-2 


3 








Total 


43 


21 




-3 


7 40 



§ 8.2. See p. 117 for details. If the drugs were really equi-effective (the 
null hypothesis) each patient would have shown the same response 
whichever drug had been given, so the apparent difference between 
drugs would depend solely on which patients happened to be selected 
for the A group and which for the B group, i.e. on how the random 
numbers happened to come up in the selection of 4 out of the 7 for drug 
A. Again, as in § 8.2, the seven measurements could be written on 
cards from which 4 are selected at random (just as in the real experi- 
ment) and called A, the other 3 being B. The difference between the 
mean for A and the mean for B is noted and the process repeated many 
times. There is actually no need to calculate the difference between 
means each time. It is sufficient to look at the total response for drug 
B (taking the smaller group for convenience) because once this is 
known the total for A follows (the total of all 7 being always 40), and 
so the difference between means also follows. If the experimentally 



Copyrighted material 



§ 9.2 Two independent samples 139 

observed total response for B ( —3 in the example), or a more extreme 
(i.e. smaller in this example) total, arises very rarely in the repeated 
randomizations it will be preferred to suppose that the differenoe 
between samples is caused by a real differenoe between drugs and the 
null hypothesis will be rejected, just as in § 8.2. 

Table 9.2.2 

Enumeration of all 35 possible ways of selecting a group of 3 patients from 
7 to be given drug B. The response for each patient is given in Table 
9.2.1. The total ranks for drug B are given for use in § 9.3 



Patient* Total Patient* Total 



8 

3km 


Jve 


n 
B 


(mg/ 100ml) 


Total 
rank 


I 
ax 


jive 

Tig 


n 
B 


(mg/ 100ml) 


Total 
rank 


6 


e 


7 


-10 


6 


1 


2 


6 


23 


14 












1 


2 


6 


22 


13 












1 


2 


7 


20 


12 


1 


6 


6 


6 


10 


1 


3 


6 


28 


16 


1 


6 


7 


3 


0 


1 


3 


6 


27 


14 


1 


6 


7 


2 


8 


1 


3 


7 


26 


13 


2 


6 


6 


10 


11 


l 


4 


6 


13 


12 


2 


6 


7 


8 


10 


1 


4 


6 


12 


11 


2 


6 


7 


7 


0 


1 


4 


7 


10 


10 


3 


6 


6 


16 


12 


2 


3 


6 


33 


18 


3 


6 


7 


13 


11 


2 


3 


6 


32 


16 


3 


6 


7 


12 


10 


2 


3 


7 


30 


14 


4 


5 


6 


0 


9 


2 


4 


6 


18 


13 


4 


6 


7 


-2 


8 


2 


4 


6 


17 


12 


4 


6 


7 


-3 


7 


2 


4 


7 


15 


11 












3 


4 


5 


23 


14 












3 


4 


6 


22 


13 












3 


4 


7 


20 


12 












1 


2 


3 


46 


18 












1 


2 


4 


30 


16 












1 


3 


4 


35 


16 












2 


3 


4 


40 


17 



With such small samples the result of such a physical randomization 
can be predicted by enumerating all 7 !/(3 14 !) = 35 possible ways (see 
eqn. (3.4.2)) of dividing 7 patients into samples of 3 and 4. This predic- 
tion depends on each of the possible ways being equiprobable, i.e. 
the one used for the actual experiment must have been picked at random if 
the analysis is to valid. The enumeration is done in Table 9.2.2. This 
table is exactly analogous to Table 8.2.2 but instead of counting the 



Copyrighted material 



140 Numerical and rank measurements 



§9.2 



number improved, the total response is calculated. For example, if 
patient*, 1, 5, and 6 had been allocated to drug B the total response 
would have been 10+(— 2) + (— 3) = 5 mg/100 ml. The results from 
Table 9.2.2 are collected in Table 9.2.3, which shows the randomization 
distribution (on the null hypothesis) of the total response to drug B. 
This is exactly analogous to Fig. 8.2.1. The observed total (—3) and 
smaller totals (the only smaller one is —10) are seen to occur 2/36 
(= 0-057) times, if the null hypothesis is true, and this is therefore the 
one-tail P. For a two-tail test (see § 6.1) an equal area can be cut off 
in the other tail (total for B - - 40) , so the result of the two-tail test is 
P =-. 4/35 = 0*114. This is not small enough to cast much suspicion on 
the truth of the null hypothesis, but it is somewhat different from the 
P = 0-372 (one tail) found in the analysis of Table 8.2.1(b), to which, 
as mentioned above, Table 9.2.1 reduces if the sizes of the improve- 
ments are ignored. In § 8.2 a one-tail P = 0.372 was found and a 
two-tail test was not possible. The reason for the difference is that in 
the results in Table 9.2.1 the 'improvements' on drug A are much 
greater in size than the (negative) 'non-improvements' on drug B. 
The two-tail test can be done since in § 8.2 all 35 randomizations 
yielded only 4 different possible results (Table 8.2.1) for the trial, but 
with numerical measurements the 35 randomizations have yielded 
27 possible results, listed in Table 9.2.3, so it it is possible to out off 
equal areas in each tail (cf. §6.1). Notice that if patient 3 had been 
in the B group and patient 4 in the A group (this leaves Tables 9.2.2 
and 9.2.3 unchanged) the observed total for group B would have been 
20+ (— 3) + (— 5) = 12 and it is seen from Table 9.2.3 that a total 
^12 occurs in a proportion 13/55 = 0*372 of cases. This one-tail P 
(when a large improvement, patient 4, is seen with drug B) is as large 
as that found in § 8.2. 

With larger samples there are too many permutations to enumerate 
easily. For two samples of 1 0 there are (by (3.4.2)) 20!/(10!10!) 
= 184756 ways of selecting 10 samples from 20 individuals. However it 
is not difficult for a computer to test a large sample of these possible 
allocations by simulating the physical randomization (random assort- 
ment of cards) mentioned at the beginning of this section, and of 
§ 8.2. Programs for doing this do not seem to be widely available at 
the moment but will doubtless become more common. This method has 
the advantage that it can allow for the rejection of a random arrange- 
ment that the experimenter finds unacceptable (e.g. all men in one 
sample) as explained in § 8.3. The results in Table 9.2.4 are observations 



Copyrighted material 



§9.2 Two independent samples 141 

Table 9.2.3 

Randomization distribution of total response (mg/ 100ml) of a group of 
3 patients given drug B (according to the null hypothesis that A and B 
are equi-effective). Constructed from Table 9.2.2 

Total for 

drug B Frequency 
(mg/ 100ml) 



— 10 




—3 




—2 




0 




2 




3 




6 




7 




8 




10 




12 




13 




15 




17 




18 




20 


2 


22 


2 


23 


2 


25 




27 




28 




30 




32 




33 




35 




40 




45 





Total 35 



made by Cushny and Peebles (1905) on the sleep-inducing properties of 
(— )-hyoscyamine (drug A) and (— )-hyoscine (drug B). They were 
used in 1908 by W. S. Gosset ('Student') as an example to illustrate the 
use of his t test, in the paper in which the test was introduced.! 
If two randomly selected groups of ten patients had been used, a 

t In this paper the names of the drugs were mistakenly given ae ( — ) hyoscyamine 
and ( + )-hycacyamine. When someone pointed this out Student commented in a latter 
to R. A. Fisher, dated 7 January 1935, "That blighter is of course perfectly right and 
of course it doesn't really matter two straws . . .' 



Copyrighted material 



142 Numerical and rank measurements 



randomization teat of the sort just described could be done as follows. 
(In the original experiment the samples were not in fact independent 
but related. The appropriate methods of analysis will be discussed in 
Chapter 10.) A random sample of 12000 from the 184756 possible 
permutations was inspected on a computer and the resulting randomiza- 
tion distribution of the total response to drug A is plotted in Fig. 9.2.1 

Table 9.2.4 

Response in hours extra sleep {compared with controls) induced 
by {—yhyoscyamine {A) and (-)-hyoscine {B). 
From Cushny and Peebles (1905) 



Drug A 


Drug B 


+ 0-7 


+ 1-9 


-16 


+ 0-8 


-0-2 


+M 


-1-2 


+ 0-1 


-01 


-01 


+3-4 


+ 4-4 


+ 3-7 


+ 6-6 


+ 0-8 


+ 16 


0-0 


+ 4-8 


+ 20 


+ 3-4 


Zy A = 7-6 


Ly 8 = 23-3 


n A = 10 


n B = 10 


y A = 0-76 


y„ = 2-83 



(of. the distribution in Table 9.2.3 found for a very small experiment). 
Of the 12000 permutations 488 gave a total response to drug A of less 
than 7-5, the observed total (Table 9.2.4), so the result of a one-tail 
randomization test is P = 488/12000 = 0-04067. With samples of this 
size there are so many possible totals that the distribution in Fig. 9.2.1 
is almost continuous, so it will be possible to cut off a virtually equal 
area in the opposite (upper) tail of the distribution. Therefore the 
result of two-tail test can be taken as P = 2x0-04067 = 0 0813. This 
is not low enough for the null hypothesis of equi -effectiveness of the 
drugs to be rejected with safety because the observed results would not 
be unlikely if the null hypothesis were true. The distribution in Fig. 
9.2.1, unlike that in Table 9.2.3, looks quite like a normal (Gaussian) 
distribution, and it will be found that the t test gives a similar result 
to that just found. 



Copyrighted material 



Two independent samples 143 




7-5 (observed value) Total mtponar to drug A (hour*) 

3-0 2-6 2-2 1-8 • 1-4 1^0 04 ii-2 -0-2-o-S -M»-l 4-1 H-2-2 -2-6-:U» 
I 58 (observed value) Corresponding value of difference 

between mean* (fc-y t > 



Fio. 0.2.1. Randomization distribution of the total response to drug A 
for Ouahny and Peebles' result*, when A and B are equi -effective (null hypothesis 
true). The vahiea of the difference between means corresponding for each total 
for A Is also shown on the abscissa (the total of all responses is 30-8 for every 
allocation so, for example, if the total for A were 10*4, the total for B must be 
20 4 so the difference between means is 10). Constructed from a random sample 
of 12000 from the 184756 ways of allocating 10 patient* out of 20 to drug A. 

9.3. Two sample randomization teat on ranks. The Wilcoxon (or 
Mann-Whitney) teat 

The difficulty with the method described in § 9.2 is that it is not 
possible to prepare tables for all possible sets of observations. However, 
if the observations are ranked in ascending order and each observation 
replaced by its rank before performing the randomization test it is 
possible to prepare tables, because now every experiment with N 
observations will involve the same numbers, 1, 2, . . N. 

In addition to the fact that it is not necessary to assume a particular 
form of distribution of the observations, another advantage is that the 
method can be used for results that are themselves ranks, or results 
that are not quantitative numerical measurements but can be ranked 
in order of magnitude (e.g. subjective pain scores). Even with numerical 



Copyrighted material 



144 Numerical and rank measurements 



§9.3 



measurements the loss of information involved in converting them to 
ranks is surprisingly Bmall. 

Assumptions. The null hypothesis is that both samples of observations 
come from the same population. If this is rejected, then, if it is wished 
to infer that the samples come from populations with different medians, 
or means, it must be assumed that the populations are the same in all 
other respects, for example that they have the same variance. 



3 



3 
C 

t 



»» Ht ii u in u i;. hi it 

'I'otitl nl milk- for limit U 
(«uin til !t rauk* from 7j 
if null hypothesis were true 



IH 



Fio. 9.3.1. Randomization distribution of the total of ranks for drug B 
(sum of 3 ranks from 7) if null hypothesis is true. From Table 0.2.2. The mean is 
12 and the standard deviation is 2-828 (from eqns. (0.3.2) and (0.3.3)). This is 
the relevant distribution for any Wilcozon two-sample test with samples of size 

3 and 4. 

The results in Table 9.2.1 will be analysed by this method. They 
have been ranked in ascending order from 1 to 7, the ranks being shown 
in the table. In Table 9.2.2 all 35 (equiprobable) ways of selecting 
3 patients from the 7 to be given drug B are enumerated. And for each 
way the total rank is given ; for example, if patients 1, 5, and 6 had had 
drug B then, on the null hypothesis that the response does not depend 
on the treatment, the total rank for drug B would be 5 + 3+2 = 10. 



§9.3 



Two independent samples 145 



The frequency of each rank total in Table 9.2.2 is plotted in Fig. 9.3.1, 
which shows the randomization distribution of the total rank for drug 
B (given the null hypothesis). This is exactly analogous to the distribu- 
tions of total response shown in Table 9.2.3 and Fig. 9.2.1, but the 
distributions of total response depend on the particular numerical 
values of the observations, whereas the distribution of the rank sum 
(given the null hypothesis) shown in Fig. 9.3.1 is the same for any 
experiment with samples of 3 and 4 observations. The values of the 
rank sum cutting off 2*5 per cent of the area in each tail can therefore 
be tabulated (Table A3, see below). 

The observed total rank for drug B was 7, and from Fig. 9.3.1, or 
Table 9.2.2, it can be seen that there are two ways of getting a total 
rank of 7 or less, so the result of a one-tail test is P = 2/35 = 0-057. 
An equal probability, 2/35, can be taken in the other tail (total rank of 
17 or more) so the result of a two-tail test is P = 4/25 = 0-114. This 
is the probability that a random selection of 3 patients from the 7 
would result in the potency of drug B (relative to A) appearing to be 
as small as (total rank = 7), or smaller than (total rank < 7), was 
actually observed, or an equally extreme result in the opposite direction, 
t/ A and B were actually equi-effective. Since such an extreme apparent 
difference between drugs would occur in 11-4 per cent of experiments 
in the long run, this experiment might easily have been one of the 
11-4 per cent, so there is no reason to suppose the drugs really differ 
(see § 6.1). In this case, but not in general, the result is exactly the 
same as found in § 9.2. 

A check can be applied to the rank sums, based on the fact that the 
mean of the first N integers, 1, 2, 3, . . . , N, is (iV+l)/2 so therefore 

sum of the first N integers = N{N+ 1)/2. (9.3.1) 

In this case 7(7-h l)/2 = 28, and this agrees with the sum of all ranks 
(Table 9.2.1), which is 21 + 7 = 28. 

The distribution of rank totals in Fig. 9.3.1 is symmetrical, and this 
will be so as long as there are no ties. The result of a two-tail test will 
therefore be exactly twice that for a one-tail test (see § 6.1). 

The use of tables for the Wilcoxon test 

The results of Cushny and Peebles in Table 9.2.4, which were 
analysed in § 9.2, are ranked in ascending order in Table 9.3.1. Where ties 
occur each member is given the average rank as shown. This method 
of dealing with ties is only an approximation if Table A3 is used 



Copyrighted material 



146 Numerical and rank measurements § 9.3 

because the table refers to the randomization distribution of integers, 
1, 2, 3, 4, 5, 6, . . . 20, not the actual figures used, i.e. 1, 2, 3, 4|, 4£, 6, 
etc. Such evidence as there is suggests a moderate number of ties does 
not cause serious error. 

The rank sum for drug A is l+2+3+4}+6+8+9$ + 14+ 15^+17 
= 80$, and for drug B it is 129§. The sum of these, 80H 12 »i - 210, 

Table 9.3.1 

The observations from Table 9.2 .4 ranked in ascending order 



Drug 


Observation 
(hours) 




Rank 


A 
A 
A 
B 
A 


-16 
-1-2 
-0-2 
-01 
-01 


1 
2 
3 

4 M 
*IJ 


_4+5 
2 


A 

B 
A 
B 
A 


00 
01 
0-7 
0-8 
0-8 


6 

7 
8 

8) 


9 + 10 
2 


B 
B 
B 
A 
B 
A 


M 
1-8 
1-9 
20 
8-4 
3-4 


11 

12 
13 

h 


15 + 18 

2 


A 
B 
B 
B 


3- 7 
4 4 

4- 6 

6*6 


17 
18 
19 
20 






Total 


210 





checks with (9.3.1), which gives 20(20+ 1)/2 = 210. A randomization 
distribution could be found, just as above and in § 9.2, for the sum of 
10 ranks picked at random from 20. The proportion of possible alloca- 
tions of patients to drug A giving a total rank of 80} or less is the 
one-tail P, as above. The two-tail P may be taken as twice this value 
though, as mentioned this may not be exact when there are ties. 



Copyrighted material 



§ 9.3 Two independent samples 147 

P (two tail) can be found (approximately) from Table A3 in which 
n x and n a are the sample sizes (n x < n a ). For each pair of sample sizes 
two figures are given. If the rank sum for sample 1 (that with n x 
observations) is equal to or less than the smaller tabulated figure, 
or if it is equal to or greater than the larger tabulated figure, then 
P (two tail) is not greater than the figure at the head of the column. 
In this case n x = 10, rw, = 10 and the pair of tabulated figures is 
82, 128f for P = 0-1, and 78, 132 for P = 0-05. The observed rank 
sum of 80} is less than 82 but greater than 78, so P is between 0*1 and 
0-05. This means that if the null hypothesis of equi -effectiveness were 
true then the probability of observing a rank sum of 80} or less would 
be under 0-05, and the probability of observing a rank sum equally 
extreme in the other direction would also be under 0-05, so the total 
two-tail P (see § 6.1) is under 01. This result is similar to that found in 
§9.2 using the slightly more powerful randomization test on the 
observations themselves. It is not small enough to provide evidence for 
a difference between the drugs. 

How to deal with samples that are too large for Table A3 

Table A3 only deals with samples containing up to 20 observations. 
For larger samples the randomization distribution of ranks (shown for a 
small sample in Fig. 9.3.1) is well approximated by a normal distribu- 
tion. // the null hypothesis is true, the distribution of the rank sum, 
/?! say, for the sample with n x observations can be shown (see, for 
example, Brownlee, 1965, p. 252) to have mean 

^ - «i(tf+l>/3, (9.3.2) 

where N = n x +n 2 is the total number of observations. For example, in 
the first example discussed in this section, n x = 3, N = 7 so u x = 3 
(7+l)/2 = 12, as is obvious by inspection of Fig. 9.3.1. The standard 
deviation of R x is (loc. cit.) 

*- Vtoitv+iyiq. (9.3.3) 

For the distribution in Fig. 9.3.1 the standard deviation is therefore 
< v / [3x4(7-fl)/12] = 2'828. Using these values, an approximate 
standard normal deviate (see § 4.3) can be calculated from (4.3.1) as 

u = (9.3.4) 

a 

f These are the rank sums that out off 5 per cent of the area in each tail (10 per cent, 
P - 0- 1 , altogether), in the analogue of Fig. 9.3.1 for aampkw of aue 10 and 10. 



Copyrighted material 



148 Numerical and rank measurements §9.3 

and the rarity of the result judged from tables of the standard normal 
distribution. 

For example, the results in Table 9.3.1 gave n x = 10, N = 20, 
B x » 80-5. Thus, from (9.3.2)-(9.3.4), 

80-5- 10(20 +l)/2 
U = V[10xl0(20+l)/12J " _1 ' 85 ' 

This value is found from tables (see § 4.3) to cut off an area P = 0 032 
in the lower tail of the standard normal distribution. The result of 
two-tail test (see § 6.1) is therefore this area, plus the equal area above 
u = +1-85, i.e. P = 2x0 032 = 0 064, in good agreement, even for 
samples of 10, with the exact result from Table A3. The two-tail result 
can be found directly by referring the value u = 1-85 to a table of 
Student's t with infinite degrees of freedom (when t becomes the 
same as u, see § 4.4). 



9.4. Student's t test for independent ssmples. A psremetric test 

This test, based on Student's t distribution (§4.4), assumes that 
the observations are normally distributed. Since this is rarely known 
it is safer to use the randomization test (§ 9.2) or, more conveniently, 
the Wilcoxon two-sample test (§ 9.3) (see §§ 4.2, 4.6 and 6.2). It will 
now be shown that when the results in Table 9.2.4, are analysed using t 
test the result is similar to that obtained using the more assumption-free 
method of §§ 7.2 and 7.3. But it cannot be assumed that the agreement 
between methods will always be so good with two samples of 10. It 
depends on the particular figures observed. If the observations were 
very non-normal the t test might be quite misleading with samples of 10. 
The assumptions of the test are explained in more detail in § 11.2 and 
there is much to be said for always writing the test as an analysis of 
variance as described at the end of § 1 1.4. There was no evidence that 
the assumptions were true in this example. 

To perform the t test it is necessary to assume that the observations 
are independent, i.e. that the size of one is not affected by the size of 
others (this assumption is necessary for all the tests described), and 
that the observations are normally distributed, and that the standard 
deviation is the same for both groups (drugs). The scatter is estimated 
for each drug separately and the results pooled. The quantity of 
interest is the difference between mean responses (y B — y A ), so the 



Copyrighted material 



§ 9.4 Two independent samples 149 

object is to estimate the standard deviation of the difference, s[y A — y B ],f 
so that it can be predicted (see example in § 2.7) how much scatter 
would be seen in {y A — y„) if it were determined many times (this 
prediction is likely to be optimistic, see § 7.2). 

(1) For drug A the sum of squared deviations, using (2.6.5), is 

(£y) 2 



Z(y-y) 2 = 



n 



(7-5) a 

= 34-43-^p = 28-805 

with n A — 1 = 10—1 = 9 degrees of freedom (see § 2.6). 

(2) For drug B the sum of squared deviations is similarly 

(23*3) a 

2(y-y) 2 = 90-37-^-^- - 36-081 

with n B — 1 = 10 — 1 degrees of freedom. 

(3) The pooled estimate of the variance of y (the response to either 
drug) is 

total sum of squares 28- 805 + 36 081 

s 2 [y] = = = 3-605 

total degrees of freedom 94-9 

with 9-f-9 =18 degrees of freedom. As it is necessary to assume that 
the scatter of responses is the same for both groups, a singled pooled 
estimate of this scatter is made. 

(4) Using (2.7.8), the variance of the mean of 10 observations on 
drug A is estimated to be 

* 2 Wa] = ~ = 0-3606, 

and similarly the variance of the mean of 10 observations on drug B is 
estimated as 

Ay] 

^b] = -~ = 0-3605. 



f Note that this means the estimated standard deviation, s, of the random variable 
(y A -y B ). It ia the functional notation described in § 2.1. It does not mean § timu 

(va-v.). 



Copyrighted material 



150 Numerical and rank measurements § 9.4 

(6) Using (2.7.3) the variance of the difference between two such 
means (assuming them to be independent, see also § 10.7) is 

^Wa-Vb] = 'WaI+^b] = 0-3605+0-3605 = 0-7210. 

The standard deviation of the difference between means is therefore 
V(0'7210) = 0-8491 hours = «[y A — # B ], with 18 degrees of freedom. 

(6) The definition of t, given in (4.4.1), is (x— u)js{x) where x is 
normally distributed and s{x) is its estimated standard deviation. In 
this case the normally distributed variable of interest is the difference 
between mean responses, (y A —y B )- It is required to test the null 
hypothesis that the drugs are equi-effective, i.e. that the population 
value of the difference between means is zero, u = 0, and therefore 
u — 0 is used in the expression for t because, as usual, it is required to 
find out what would happen if the null hypothesis were true. Inserting 
these quantities gives, on the null hypothesis, 

= 0A-fo)-O = 2-33-0-75 = 
*[y A -y B ] 0-8491 

References to a table of the distribution of t (see § 4.4 p. 77) for 18 
degrees of freedom, shows that 5 per cent of the area lies outside the 
t = ±2-101 and 10 per cent lies outside t — ±1-734 (cf. §§ 4.4 and 
6. 1). Therefore, for a two-tail test, P is between 0 05 and 0-1. This would 
be the probability of observing a value of t differing from zero by as 
much as, or more than, 1*861, if the null hypothesis were true, and if 
the assumptions of normality, etc. were correct. It is not small enough 
to make one reject the null hypothesis that the drugs are really equi- 
effective (/< = 0). See also § 6.1. 

In general, to compare two independent samples (A and B) of 
normally distributed mutually independent observations one calculates, 
condensing the above argument into a single formula, 



/ p(yA-y A ) a +S(y B -p B ) a / i | i yi (9.4.1) 

V L n A -\-n B —2 \n A n R /J 



+ n B — 2 \n A n B 



where n A and n B are the numbers of observations in each sample (not 
necessarily equal); u is the hypothetical value of the difference to be 
tested (most often zero, but see example in § 12.5 in which it is not), 
and the vertical bars round the numerator indicate that its sign is 
ignored, i.e. t is taken as positive. This quantity is referred to tables of 



Copyrighted material 



§9.4 



Two independent samples 151 



the t distribution (with » A + n B — 2 degrees of freedom) in order to 
find P. 

Use of confidence limits leads to the same conclusion as the t test 

The variable of interest is the differenoe between means {$ A — g B ) 
and its observed value in the example was 2-33—0-75 = 1-58 hours. 
The standard deviation of this quantity was found to be 0-8491 hours. 
The expression found in § 7.4 for the confidence limits for the population 
mean of any normally distributed available x, viz. x±ts{x), will be 
used. For 90 per oent (or P = 0-9) conndenoe intervals the P = 01 
value off (with 18 d.f.) is found from tables. It is, as mentioned above, 
1-734. Gaussian confidence limits for the population mean value of 
(9a~9b) are therefore l-58±(l-734x 0-8491), i.e. from 011 to 3-05 
hours. Because these do not include the hypothetical value of zero 
(implying that the drugs are equi-effective) the observations are not 
compatible with this hypothesis, if P = 0-9 is a sufficient level of 
certainty. For a greater degree of certainty 95 per cent confidence 
limits would be be found. The value of t for P = 0-05 and 18 d.f. was 
found above to be 2- 101 so the Gaussian confidence limits are 1-58 
±(2-101x0-8491), i.e. from -0-2 to +3-36 hours. At this level of 
confidence the results are compatible with a population difference 
between means of ^ = 0, because the limits include zero. These results 
imply that confidence limits can be thought of in terms of a significance 
test. For any given probability level (a) the variable will be found 
'significantly' different (at the P = a level) from any hypothetical 
value (zero in this case) of the variable that falls outside the 100(1 —a) 
per cent confidence limits. 



Copyrighted material 



10. Numerical and rank measurements. 
Two related samples 



10.1. Relationship between various methods 

The observations on the soporific effect of two drugs in Table 9.2.4 
were analysed in §§ 0.2-9.4 as though they had been made on two 
independent samples of 10 patients. In fact both of the drugs were 
tested on each patient,! 80 there were only 10 patients altogether. 
The unit on which each pair of observations is made is called, in general, 
a block (— patient in this case). It is assumed throughout that observa- 
tions are independent of each other. This may not be true when the 
pair of observations are both made on the same subject as in Table 
10.1.1, rather than on two different subjects who have been matched 
in some way. The responses may depend on whether A or B is given 
first, for example, because of a residual effect of the first treatment, 
or because of the passage of time. It must be assumed that this does 
not happen. See § 8.6 for a discussion of this point. Appropriate 
analyses for related samples (see § 6.4) are discussed in this chapter. 
Chapters 6-9 should be read first. 

Because results of comparisons on a single patient are likely to be 
more consistent than observations on two separate patients, it seems 
sensible to restrict attention to the difference in response between 
drugs A and B. These differences, denoted, d are shown in Table 
10.1.1. The total for each pair is also tabulated for use in §§ 11.2 and 
11.6. 

These results will be analysed by four methods. The sign test (§ 10.2) 
is quick and nonparametric : and, alone in this chapter, it does not 
need quantitative numerical measurements; scores or ranks will do. 
The randomization test on the observed differences (§ 10.3) is best for 
quantitative numerical measurements. It suffers from the fact that, 
like the analogous test for independent samples (§ 9.2), it is impossible 
to construct tables for all possible observations ; so, except in extreme 
cases (like this one), the procedure, though very simple, will be lengthy 
unless done on a computer. In § 10.4 this problem is overcome, as in 

f Whether A or B is given first should be decided randomly for each patient. See 
§§ 6.4 and 2.3. 



Copyrighted material 



§10.1 Two related samples 153 

§ 9.3, by doing the randomization test on ranks instead of on the 
original observations — the Wilcoxon signed ranks teat. This is the 
best method for routine use (see § 6.2). In § 10.6 the test baaed on the 



Table 10.1.1 

The results from Table 9.2.4 presented in a way showing 
how the experiment was really done 



Patient 






Difference 


Total 


(block) 


Vk 


Vm 


d = <ir.-¥ A > 


<F.+Va) 


1 


+0-7 


+ 1*9 


+ 1-2 


26 


2 


-1-6 


+ 0-8 


+ 2-4 


-0-8 


3 


-0-2 


+M 


+ 1-3 


0-9 


4 


-1-2 


+ 0-1 


+ 1-3 


-M 


5 


-01 


-01 


0 


-0-2 


6 


+ 3-4 


+ 4-4 


+ 10 


7-8 


7 


+ 3-7 


+6-5 


+ 1-8 


9-2 


8 


+0-8 


+ 1-6 


+ 0-8 


2-4 


9 


0 


+ 4-6 


+4-8 


4-0 


10 


+20 


4-3-4 


+ 1'4 


5-4 


Totals 


7-6 


22-3 


16-8 


30-8 



mean 1-58 



assumption of a normal distribution, Student's paired t test, is described 
(see §§ 6.2 and 9.4). Unless the distribution of the observations is 
known to be normal, at least six pairs of observations are needed, as 
discussed in § 6.2 (see also § 10.6). 

10.2. The sign test 

This test is baaed on the proposition that the difference between the 
two readings of a pair is equally likely to be positive or negative if the 
two treatments are equi -effective (the null hypothesis). This means 
(if zero differences are ignored) that there is a 50 per cent chance (i.e. 
& — 0-5) of a positive difference and a 50 per cent chance of a negative 
difference. In other words, the null hypothesis is that the population 
(true) median difference is zero. The argument is closely related to 
that in §§ 7.3 and 7.7 (see below). It is sufficient to be able to rank the 
members of each pair. Numerical measurements are not necessary. 

Example (1). In Table 10.1.1 there are 9 positive differences out of 
9 (the zero difference is ignored though a better procedure is probably 
to allocate to it a sign that is one the safe side, see footnote on p. 155). 



Copyrighted material 



154 Numerical and rank measurements 



§ 10.2 



// the probability of a positive difference is 1/2 (null hypothesis) then 
the probability of observing 9 positive differences in 9 'trials of the 
event' (just like 9 heads out of 9 tosses of an unbiased coin) is given by 
the binomial distribution (3.4.3) as (1/2) 8 = 1/512^0002. For a two- 
tail test of significance (see §6.1) equally extreme deviations in the 
opposite direction (i.e. 9 negative signs out of 9) must be taken into 
account and for this P~0-002 also, so the result of a two-tail sign test 
is P~0 00i. This is substantially lower than the values obtained in 
Chapter 9 (when it was not taken into account that the samples were 
related) and suggests rejection of the null hypothesis because results 
deviating it by as much as was actually observed would be rare if it 
were true. 

Example (2). If there had been one negative difference (however 
small) and 9 positive ones, then the one tail P (see f 6.1) would be the 
probability of observing 9 or more positive signs out of 10. This would 
be the situation if it were decided to count the zero difference in Table 
10.1.1 as negative, to be on the safe side. From the binomial distribu- 
tion, (3.4.3), the probability of observing 9 positive differences out of 10 
is 



and the probability of 10 positive differences out of 10 (the only more 
extreme result) is P( 10) = (1/2) 10 = 0 000976. Therefore the probability 
of observing 9 or 10 positive signs out of 10 is 0-00976+0000976 
= 0-0107. The two-tail P (see § 6.1) includes equally extreme results 
in the other direction (1 or fewer positive signs out of 10, i.e. 9 or more 
negative signs) for which P = 0 0107 also, so the two-tail P = 0-0107 
+ 00107 = 0-0214. 

This means, in words, that if the null hypothesis {that & = 0-6, imply- 
ing equi-effexUiveness of the treatments) were true, then, in the long run, 
2-14 per cent of repeated experiments would give results differing in either 
direction from the results expected on the null hypothesis {i.e. 5 negative 
signs out of 10) by as much as, or more than, was actually observed in the 
experiment. This is a sufficiently rare event to cast some doubt on thepremise 
of equi-effectiveness (see § 6.1). 




Copyrighted material 



§ 10.2 Two related samples 155 

The general result 

Generalizing the argument shows that if r obs differences out of n are 
observed to be negative (or positive, if there are fewer positive signs 
than negative), then the result of a two-tail test of the hypothesis that 
the population median difference is zero is 

p = 2 2 Hiiz^(i) • (10 - 21) 

How to find the results without calculation 

There are several ways of using tables to get the result. 

Method (1). One way is to find confidence limits for the median 
difference (d value) from Table Al, as described in § 7.3. If the con- 
fidence limits do not include zero (or, more generally, any other 
hypothetical value that it is wished to test for consistency with the 
observations), then, as explained in § 9.4, the observations are not 
consistent with the null hypothesis. For example, the results (d values) 
in Table 10.1.1 consist of n — 9 non-zero differences.! The method of 
§ 7.3 shows, using Table Al, that 99-60 per cent confidence limits for 
the population median are provided by the largest and smallest of the 
nine observations, i.e. -f- 0-8 to +4-6. These limits do not include zero, 
so the results are not consistent with the null hypothesis and the result 
for a two-tail test is that P is not greater than 1-0-996 = 0-004 (in 
this case P = 0-004 as shown above). 

Putting the matter more precisely, the exact value of P for the 
confidence limits that just fail to inolude zero (e.g. such that the next 
smallest observation below the lower limit would be negative) will 
be the same as the exact value of P for a two-tail test (see method 
(2) below). By way of example suppose that patient 5 (Table 10.1.1) 
had given a difference of —0-01 (rather than zero) in Example (2) 
above. Table Al shows that the 99-8 per cent confidence limits for the 
population median difference, based on a sample of n — 10 differences, 
are provided by the largest and smallest observations, — 0 01 to 4-4-6 
These limits include zero. The 97-86 per cent confidence limits, from 

t This situation shown the difficulties that can be introduced by ties. There is no reason 
to exclude the zero difference when finding confidence limits for the median, but the 
results will only agree exactly with the sign test (from which the zero was omitted) if 
this is done. The best answer is probably to be on the safe side. This usually means 
counting the zero difference as though it had the sign least conducive to rejection of the 
null hypothesis. In the example discussed this means pretending that patient 6 actually 

Example (2) shows that P 0 02 14 in this case. 



Copyrighted material 



158 Numerical and rank measurements 



§10.2 



Table Al, are the next-to-smallest and next-to-largest observations, 
i.e. +0-8 and 4.2-4, which just fail to include zero. This agrees with the 
exact two-tail result, P = 00214 (= 1—0-9786), found by direct 
calculation above. 

Method (2). The same result is obtained if Table Al is entered with 
»* = »'ob.-fl- This is obvious if (10.2.1) is compared with (7.3.3). 
Considerations of a few examples shows that if limits are taken as the 
Oobs+l) tne observation from end of the ranked observations, the 
limits will just fail to include zero. For example, in Table 10.1.1, as 
just discussed, r = r ob .+ l = 1, gives P = 1-0-996 = 0 004. Likewise 
in the second example above, r obl = 1 negative sign out of n = 10. 
Entering Table Al with n = 10 and r = r otw + 1 = 2 gives the result of 
the two-tail significance test as P = 1-0-9786 = 0 0214, exactly as 
found from first principles above. 

Method (3). As might be expected, the same result can be obtained by 
finding confidence limits, ^ H and for the population proportion 
(^) of positive (or negative) differences and seeing whether these limits 
include & = 0-5 or not. The method has been described in § 7.7 and the 
the result can be obtained, as explained there, from Table A2. It will be 
left to the reader to improve his moral fibre by showing (by comparing 
(7.7.1), (7.7.2), and (7.3.2)) that if the upper confidence limit for 
the population median difference, found above, just fails to include 
zero, then it will be found that the upper confidence limit for ^, is 
equal to or less than 0-5. Similarly, if the lower confidence limit for the 
population median just fails to include zero then it will be found that 
& u > 0-6. 

For example, in Table 10.1.1, r obt = 0 out n = 9 differences were 
negative, so lOOr/n = 0 per cent negative differences were observed. 
Entering Table A2 with r = 0 and n — 9 showB that 99 per cent 
confidence limits for the population proportion of negative differences 
are ^ L = 0 and 0> B = 0-445. These limits do not include 0-5 (as 
expected) and this implies that for a two-tail significance test P 
< 0-01 (i.e. 1 —0*99), as found above. 

In the second example above (r^ = 1 negative difference out of 
n = 10), consulting Table A2 with r = 1, n = 10, gives 95 per cent 
confidence limits for the population proportion (^) of negative differ- 
ences as 0-0026 and 0-445 which do not include 0-5. This is as expected 
from the fact that the 97-86 per cent (which is as near to 95 per cent 
as it is possible to get, see § 7.3) confidence limits for the population 
median difference, +0-8 to +2-4 found above, just fail to include 



Copyrighted material 



§ 10.2 Two related samples 157 

zero. The 99 per cent confidence limita for & are 0-0005 to 0*5443, 
which do include the null hypothetical value, f? = 0-5, as expected. 
These results imply that the result of a two-tail sign test is 0*01 < P 
< 0-05. The exact result is 0*0214, found above. 



10.3. The randomization test for paired observations 

The principle involved is that desoribed in §§ 6.3, 8.2, 8.3, and 9.2. 
As in § 9.2, it is not possible to prepare tables to facilitate the test 
when it is done on the actual observations. However, in extreme 
cases, like the present example, or when the samples are very small, as 
in § 8.2, the test is easy to do (see § 10.1). 

As before, attention is restricted to the subjects actually tested. 
The members of a pair of observations may be testa at two different 
times on the same subject (as in this example), or tests on the members 
of a matched pair of subjects (see § 6.4). It is supposed that if the null 
hypothesis (that the treatments are equi -effective) were true then the 
observations on each member of the pair would have been the same 
even if the other treatment (e.g. drug) had been given (see p. 117 for 
details). In designing the experiment it was (or should have been) 
decided strictly at random (see § 2.3) which member of the pair 
received treatment A and whioh B, or, in the present case, whether A 
or B was given first. If A had been given instead of B, and B instead of 
A, the only effect on the difference in responses {d in Table 10.1.1), 
if the null hypothesis were true, would be that its sign would be changed. 
According to the null hypothesis then, the sign of the difference between 
A and B that was observed must have been decided by the particular 
way the random numbers came up during the random allocation of A 
and B to members of a pair. In repeated random allocations it would be 
equally probable that each difference would have a positive or a neg- 
ative sign. For example, for patient 1 in Table 10. 1.1, the randomization 
decided whether +0-7 or +1-9 was labelled A and hence, according to 
the null hypothesis, whether the difference was +1-2 or —1-2. It can 
therefore be found out whether (if the null hypothesis were true) it would 
be probable that a random allocation of drugs to these patients would 
give rise to a mean difference as large as (or larger than) that observed 
(1-58 hours), by inspecting the mean differences produced by all 
possible allocations, (i.e. all possible combinations of positive and 
negative signs attached to the differences). If this is sufficiently im- 
probable the null hypothesis will be rejected in the usual way (Chapter 



Copyrighted material 



158 Numerical and rank measurements 



§ 10.3 



6). In fact it can be shown that the same result can be obtained by 
inspecting the sum of only the positive (or of only the negative) d 
values resulting from random allocation of signs to the differences, so it 
is not necessary to find the mean each timet (similar situations arose 
in §§ 8.2 and 9.2). 

Assumptions. Putting the matter a bit more rigorously, it can be 
seen that the hypothesis that an observation (d value) is equally 
likely to be positive or negative, whatever its magnitude, implies that 
the distribution of d values is symmetrical (see § 4.5), with a mean of 
zero. The null hypothesis is therefore that the distribution of d values 
is symmetrical about zero, and this will be true either if the y A and y B 
values have identical distributions (not necessarily symmetrical), or 
the distributions of y A and y B values both have symmetrical distribu- 
tions (not necessarily identical) with the same mean. This makes it clear 
that if the null hypothesis is rejected, then, if it is wished to infer from 
this that the distributions of y A and t/ B have different population means, 
it must be assumed either that their distributions both have the same 
shape (i.e. are identical apart from the mean), or that they are both 
symmetrical. 

Note that when the analysis is done by enumerating possible alloca- 
tions it is assumed that eaoh is equi -probable, i.e. that an allocation 
was picked at random for the experiment, the design of which is therefore 
inextricably linked with its analysis (see § 2.3) 

If there are n differences (10 in Table 10.1.1) then there are 2 n 
possible ways of allocating signs to them (because one difference oan 

be + or — , two can be ++, H — , — -f, or , and each time another 

is added the number of possibilities doubles). All of these combinations 
could be enumerated as in Table 8.2.2 and Fig. 8.2.1, and Tables 9.2.2 
and 9.3. This is done, using ranks, in § 10.4. In the present example, 
however, only the most extreme ones are needed. 

Example (1). In the results in Table 10.1.1 there are 9 positive 
differences out of 9 (the zero difference, even if included, would have no 
effect because the total is the same whatever sign is attached to it). 
The number of wayB in which signs can be allocated is 2 9 = 512. The 
observed allocation is the most extreme (no other can give a mean of 
1-58 or larger) so the chance that it will come up is 1/512. For a two-tail 
test (taking into account the other most extreme possibility, all signs 

f As before this is because the total of all difference* is the same (1/5-8 in the example) 
for all randomizations, so specifying the sum of negative differences also specifies the 
its ©An dxffd re n , 



Copyrighted material 



§10.3 



Two related samples 159 



negative, see §6.1), the P value is therefore 2/512~0-004. In this 
most extreme case (though no other) the result is the same as given by 
the sign test ( § 10.2). Consider, for example, what would have happened 
if patient 5 had given a negative difference instead of zero. The result 
of the randomization test will, unlike that of the sign test, depend on 
how large the negative difference is. 

Example (2). Suppose that patient 5 had given d = —0 9, the other 
patients being as in Table 10.1.1. There are now 2 10 = 1024 possible 
ways of allocating signs to the n = 10 differences. How many of these 
give a total for the negative differences (see above) equal to less than 
0-9? Apart from the observed allocation, only two. That in which 
patient 8 is negative but 6 is positive giving a sum of negative differ- 
ences of 0-8, and that in which all differences are positive giving a sum 
of negative differences of zero. The probability of observing, on the 
null hypothesis, a sum of negative differences as extreme as, or more 
extreme than 0-9 is thus 3/1024. For a two-tail test (see § 6. 1 ), therefore, 
P — 6/1024 = 0-0059| (see next example for the detailed interpreta- 
tion). 

Example (3). If, however, patient 5 had had d = 2*0, the mean 
difference, d, would have been 13-8/10 = 1-38. In this case a sum of 
negative differences equal to or less than 2 could arise in ten different 
ways, as well as that observed, so P (one tail) = 11/1024 and P 
(two tail) = 22/1024 = 00225. f The 11 possible ways are (a) all 
differences positive (sum = 0), (b) one difference negative (patient 
8, 6, 1, 3, 4, 10, 7, or 5) giving a sum of 0-8, 10, 1-2, 1-3, 1-3, 1-4, 1-8, or 
2-0, depending on which patient has the negative difference, (o) two 
differences negative, patients 6 and 8 giving a sum of negative differ- 
ences of 1-0+0-8 = 1-8, or patients 1 and 8 giving a sum of 1-2+0-8 
= 20. 

This result means that if the null hypothesis were true then the 
probability would be only 0 0225 that the random numbers would 
come up, during the allocation of the treatments, in such a way as to 
give a Bum of negative differences of 2-0 or less (i.e. a mean difference 
between B and A of 1-38 or more), or results equally extreme in the 
other direction (A giving larger responses than B). This probability 
is small enough to make one suspect the null hypothesis (see § 6.1). 

t In general it is possible, though uncommon in this sort of test, that an exactly 
equal area could not be out off in the opposite tail so twice the one tail P may be a 
maximum value for the two-tail P (see f 8.1). 

M 



Copyrighted material 



160 Numerical and rank measurements § 10.4 

10.4. The wilcoxon signed-renks test for two related samples 

This test works on much the same principle as the randomization 
test in § 10.3 except that ranks are used, and this allows tables to be 
constructed, making the test very easy to do. The relation between the 
methods of §§ 9.2 and 9.3 for independent samples was very similar. 
However, the signed-ranks test, unlike the sign test (§ 10.2) or the 
rank test for two independent samples (§9.3), will not work with 
observations that themselves have the nature of ranks rather than 
quantitative numerical measurements. The measurements must be 
such that the values of the differences between the members of eaoh 
pair can properly be ranked. This would certainly not be possible if 
the observations were ranks. If the observations were arbitrary scores 
(e.g. for intensity of pain, or from a psychological test) they would 
be suitable for this test if it could be said, for example, that a pair 
difference of 80— 70 = 10 corresponded, in some meaningful way, to a 
smaller effect than a pair difference of 26— 10 = 15. Seigel (1966a, b) 
discusses the sorts of measurement that will do, but if you are in 
doubt use the sign test, and keep Wilcoxon for quantitative numerical 
measurements. Sections 9.2, 9.3, 10.3 and Chapter 6, should be read 
before this section. The precise nature of the assumptions and null 
hypothesis have been discussed already in § 10.3. 

The method of ranking is to arrange all the differences in ascending 
order regardless of sign, rank them 1 to n and then attach the sign of the 
difference to the rank. Zero differences are omitted altogether. Differ- 
ences equal in absolute value are allotted mean ranks as shown in 
examples (2) and (3) below (and in § 9.2). To use Table A4 find T, which 
is either the sum of the positive ranks or the sum of the negative ranks, 
whichever sum is smaller. Consulting Table A4 with the appropriate 
n and T gives the two-tail P at the head of the column. Examples are 
given below. Of course for simple cases the analysis can be done 
directly on the ranks as in § 10.3. 

How Table A4 is constructed 

Suppose that n = 4 pairs of observations were made, the differences 
{d) being +01, —11, —0-7, +0-4. Ranking, regardless of sign, gives 

d +0-1 +0-4 -0-7 -M 

rank 1 2-3-4 

The observed sum of positive ranks is 1 + 2 = 3, and the observed sum 
of negative ranks is 3+4 = 7. The sum of all four ranks, from (9.3.1), 



Copyrighted material 



Two related samples 161 



isn(n+l)/2 = 4(4+l)/2 = 10, which check* (3+7 = 10). Tim* T = 3, 
the smaller of the rank sums. Table A4 indicates that it is not possible 
to find evidence against the null hypothesis with a sample as small as 
4 differences. This is because there are only 2* = 2* = 16 different 
ways in which the results could have turned out (i.e. ways of allocating 
signs to the differences, see § 10.3), when the null hypothesis is true. 

Tablb 10.4.1 

The 16 possible ways in which a trial on four pairs of subjects could turn 
out if treatments A and B were equi -effective, so the sign of each difference 
is decided by whether the randomization process allocates A or B to the 
member of the pair giving the larger response. For example, on the second 
line the smallest difference is negative and all the rest are positive, giving 

sum of negative ranks — 1 



Rank 1 


2 


3 


4 


Sum of 
pos. ranks. 


Sum of 
ii eg. ranks 


T 


+ 


+ 


+ 


+ 


10 


0 


0 




+ 


+ 


+ 


9 




1 


+ 






+ 


8 




2 


+ 


+ 




+ 


7 




3 


+ 


+ 


+ 




0 




4 






+ 


-f 


7 




8 




+ 




+ 


6 




4 




+ 


+ 




5 




6 


+ 






+ 


6 




6 


+ 




+ 




4 




4 


+ 


+ 






3 




3 


+ 








1 




1 




+ 






2 




2 






+ 




3 




3 








+ 


4 




4 










0 


10 


0 



Therefore, even the most extreme result, all differences positive, would 
appear, in the long run, in 1/16 of repeated random allocations of 
treatments to members of the pairs. Similarly 4 negative differences 
out of 4 would be seen in 1/16 of experiments. The result of a two tail 
test cannot, therefore, be less than P = 2/16 = 01 25 with a sample of 
four differences, however large the differences (see, however, §§6.1 and 



Copyrighted material 



162 Numerical and rank measurements 



§ 10.4 



10.5 for further oomments). With a small sample like this, it is easy to 
illustrate the principle of the method. More realistic examples are 
given below. 

The 2* = 16 possible ways of allocating signs to the four differences 
(i.e. the possible ways in which A and B could have been allocated to 
members of a pair, see § 10.3 for a full discussion of this process), are 
listed systematically in Table 10.4.1, together with the sums of positive 
and negative ranks, and value of T, corresponding to each allocation. 
In Table 10.4.2, the frequencies of these quantities are listed from the 

Table: 10.4.2 

The relative frequencies of observing various values of i.e. the distributions 
of, the rank sum, and T, with n — 4 pairs of observations when the null 
hypothesis is true. Constructed from Table 10.4.1 



Rank 
sum 


Frequency for Fi 
pos. ranks 


<equenc3 
neg. ran 


7 for Frequency for 
Jcs T 


0 


1 


1 


2 


1 


1 


1 


2 


2 


1 


1 


2 


3 


2 


2 


4 


4 


2 


2 


4 


6 


2 


2 


2 


0 


2 


2 




7 


2 


2 




8 


1 


1 




9 


1 


1 




10 


1 


1 




Total 


16 


16 


16 



results in Table 10.4.1, and in Fig. 10.4.1 the distribution of the sum 
of positive ranks is plotted (that for negative ranks is identical). 
(These are the paired sample analogues of the rank distribution worked 
out for two independent samples in Table 9.2.2 and Fig. 9.3.1.) 

Now the observed sum of positive ranks was 3, and the probability 
of observing a sum of 3 or less is seen from Table 10.4.2 or Fig. 10.4.1, to 
be 5/16. The probability of an equally large deviation from the null 
hypothesis in the other direction (sum of positive ranks > 7) is also 
5/16. (The distribution is symmetrical, like that in Fig. 9.3.1, unless 
there are ties, so the result of a two-tail test is twice that for a one-tail 
test. See § 6.1.) The result of a two-tail significance test is therefore 



Copyrighted material 



§ 10.4 Two related samples 163 

P = 10/16 = 0-626, so there is no evidence against the null hypothesis, 
because results deviating from it by as much as, or more than, the 
observed amount would be common if it were true. In other words, if 
the null hypothesis were true it would be rejected (wrongly) in 62*5 
per cent of repeated experiments in the long run, if it were rejected 
whenever the sum of positive ranks was 3 or less, or when it was equally 
extreme in the other direction (7 or greater). A value of T equal to or 
less than the observed value (3) is seen, from Table 10.4.2, to occur in 
4 + 2-|-2-|-2 = 10 of the 1 6 possible random allocations. The probability 
(on the null hypothesis) of observing T < 3 is therefore P = 10/16 

2r i I 1 1 ! I 



c 

1 1 

t 



(1 



I) 



4 5 6 
Rank Hum 



10 



F io . 10.4.1. Distribution of the sum of positive ranks when the null hypothesis 
is true for the Wilcoxon signed ranks test with four pairs of observations. The 
distribution is identical for sum of negative ranks. Plotted from Table 10.4.2. 



= 0-625 which is another way of putting the same result. As in § 9.3, 
the calculations in Tables 10.4.1 and 10.4.2 would be the same for any 
experiment with n = 4 pairs of observations. The values of T cutting off 
suitably small tail areas (1 per cent, 5 per cent, etc.) can therefore be 
tabulated for various sample sizes. (The smallest possible value, 
T = 0, outs of an area of P = 2/16 = 0-125 for the small sample in 
Table 10.4.2, as mentioned above.) This is what is given in Table A4. A3 

Example (1). In Table 10.1.1, there are 9 positive differences out of 9 
so all ranks are positive and T = sum of negative ranks = 0. Consulting 
Table A3 with n = 9, T = 0, shows P < 0 01 (because T is less than 
2, the tabulated value for P = 0 01). In fact doing the test directly 
it is seen that there is only one way (the observed one) of getting a sum 
of negative ranks as extreme as zero, out of 2 e = 512 ways of allocating 
signs (see § 10.3). So P (one tail) = 1/612, and P(two tail) = 2/512 
= 0-004 (exactly as in §§ 10.2 and 10.3 for this extreme case, but not 
in general). This is quite strong evidence against the null hypothesis. 



164 Numerical and work measurements 



§10.4 



This is, as usual, because •/ the null hypothesis were true, deviations 
from it (in either direction) as large as, or larger than, those observed 
in this experiment would occur only rarely (P = 0-004) because the 
random numbers happened to come up so that all the subjects giving 
big responses were given the same treatment. 

Example (2). Suppose however, as in § 10.3, Example (2), that patient 
5 had given d = —0-9 instead of zero. When the observations are ranked 
regardless of sign the result is as follows : 

d 0-8 -0-9 1 0 1-2 1-3 1-3 1-4 1-8 2-4 4-6 
rank 1 -2 3 4 5$ 5} 7 8 9 10 

Thus, n — 10, sum of negative ranks = 2, and sum of positive ranks 
= 53. Thus, T — 2, the smaller rank sum. The total of all 10 ranks 
should be, from (9.3.1), n(n+ 1)/2 = 10<10+l)/2 = 55, and in fact 
53+2 = 55. Consulting Table A4 with n = 10 and T = 2, again 
shows P < 0-01 (two tail). An exact analysis is easily done in this case. 
A sum of negative ranks of 2 or less could arise with only 2 combinations 
of signs, in addition to the observed one (rank 1 negative giving T — 1 
or all ranks positive giving T = 0), and there are 2* = 2 10 = 1024 
possible ways of allocating signs (see § 10.3). Thus P (one tail) = 3/1024 
and P (two tail) = 6/1024 = 0-0059. Again quite strong evidence 
against the null hypothesis. 

Example (3). If patient 5 had had d = -2-0 (as discussed in § 10.3) 
the ranking process would be as follows : 

d 0-8 1-0 1-2 1-3 1-3 1-4 1-8 -20 2-4 4-6 
rank 1 2 3 4] 4\ 6 7 -8 9 10 

Consulting Table A4 with n = 10 and T = 8 gives P = 0 05. Enumera- 
tion of all possible ways of achieving a sum of negative ranks of 8 or 
less shows there to be 25 waysf so the exact two-tail P is 50/1024 
- 0 049. 

Example (4). Consider the following 12 differences observed in a 
paired experiment, shown after ranking in ascending order, disregarding 
the sign. 

d 01 -0-7 0-8 M -12 1-6 -21 -2-3 -2-4 -2-6 -2-7 -31 
rank 1 -2 3 4 -6 6 -7 -8 -9 -10 -11 -12 

t This ii found by constructing a sum of 8 or less from the integers from 1 to n( = 10), 
as used in calculating Table A4. More properly it should be done with the figures 1 , 2, 3, 
41a, 4|b, 8, 7, 8, 9, 10 and with these there are only 24 ways of getting a sum of 8 or 



Copyrighted material 



§ 10.4 Two related samples 166 

In this case most of the observed differences are negative. Is the 
population mean difference different from zero? The sum of the 
negative ranks is 2+6+7+8+9+10+11 + 12 = 64, and the sum of 
the positive ranks is 1+3+4+6 = 14, so T = 14, the •mailer rank 
sum. An arithmetical check is provided by (9.3.1) which gives n(n+ 1)/2 
= 12(12+l)/2 = 78, and, correctly, 64+14 = 78. Table A4 shows that 
when n = 12, a value of T = 14 corresponds to P= 0-06. Only 
marginal evidence against the null hypothesis (see § 6.1). 



How to deal with samples too large for Table A4 

Table A4 deals only with samples up up to n = 26 pairs of observa- 
tions. For larger samples, as in § 9.3, it is a very good approximation 
to assume that the distribution of the rank sum, shown for a small 
sample in Fig. 10.4.1, is Gaussian (normal) with (given the null hypo- 
thesis) a mean of 

fx = n(n+ 1)/4 (10.4.1) 



and standard deviation 



(the derivations of ft and a are given, for example, by Brownlee 
(1966, p. 268)). For example, for the distribution in Fig. 10.4.1, n = 4 so 
the mean is p = 4(4+l)/4 = 6, as is obvious from the figure, and 
a-V{4(4+l)(8+l)/24} = 2-74. 

The results in Example (4) can be used to illustrate the normal 
approximation. In this example, n = 12, so u = 12(12+ 1)/4 = 39, and 
tr= V( 1 2(12 + l)(24 + l)/24} = 12-76. An approximate standard nor- 
mal deviate (see § 4.3) can therefore be calculated from (4.3.1) as 

IT-al 114-391 26 

The vertical bars mean, as usual, that the numerator is taken as positive. 
The same value would be obtained if the sum of negative ranks, 64, 
were used in (10.4.3), because 64 — 39 = 25. This value can now be 
referred to of the standard normal distribution (see § 4.3), or tables of 
t (with infinite degrees of freedom), (see § 4.4). A value of u = 1-96 cuts 
off an area P = 0*06 in the tails of the standard normal distribution, as 
explained in § 4.3. In other words of value of u above +1*96, or less 
than —1-96, would occur in 6 per cent of repeated experiments. In 



Copyrighted material 



186 Numerical and rank measurements § 10.4 

this case the normal approximation gives the same value as the exact 
result, P = 0 05, found above from Table A4. 

10.6. A data selection problem arising in small samples 

Consider the paired observations of responses to treatments A and B shown 
in Table 10.5.1. 



Table 10.5.1 





Treatment 




Block 


A 


B 


Difference 


1 


M 


0-5 


+ 1-2 


2 


1-2 


0-5 


+0-7 


3 


1-8 


0-9 


+0-9 


4 


10 


0-7 


+0-3 



The experiment was designed exactly like that in Table 1 0. 1 . 1 . All the differences 
are positive so the three nonparametric tests described in §§ 10.2-10.4 all give 
P = 2/2* = 1/8 for a two-tail teat. In general, for n differences all with the 
same sign, the result would be 2/2". 

It has been stated that the design of an experiment dictates the form its an- 
alysis must take. Selection of particular features after the results have been seen 
(data-snooping) can make significance testa very misleading. Methods of dealing 
with the sort of data-snoopingt problem that arise when comparing more than 
two treatments are discussed in § 11.9. Nevertheless, it seems unreasonable to 
ignore the fact that in these results, the observations are completely separated, 
i.e. the smallest response to A is bigger than the largest response to B, a feature 
of the results that has not been taken into account by the paired tests. (In 
general, the statistician is not saying that experimenters should not look too 
closely at the results of their experiments, but that proper allowance should be 
made for selection of particular features.) This feature means that if the results 
could be analysed by the two nonparametric methods designed for independent 
samples (described in §§9.2 and 9.3), both methods would give the probability 
of complete separation of the results, if the treatments were actually equi- 
effective, as P = 2n!n!/(2n)l = 2(4!4!)/8l = 1/35 (two-tail)— a far more •signi- 
ficant' result t The naive interpretation of this is that it would have been better 
not to do a paired experiment. This is quite wrong. It has been shown by Stone 
(1069) that the probability (given the null hypothesis of equi-effectiveness of 
A and B) of complete separation of the two groups as observed, would be 1/35 
even if there were no differences between the blocks, and even less than 1/35 if 
there were such differences. This is not the same as the P = 1/8 found using the 
paired tests because it is the probability of a different event. // the null hypo- 
thesis were true, then, in the long run, 1 in 8 of repeated experiments would be 
expected to show 4 differences all with the same sign out of 4, but only 1 in 
36, or fewer, would have no overlap between groups as in this case. 

It remains to be decided what should be done faced with observations such as 

t This is statisticians jargon. 'Data selection' might be better. 



Copyrighted material 



§10.5 



Two related samples 167 



those in Table 10.5.1. The snag is, of course, that any single specified arrange- 
ment of the results is improbable. If the treatments were equi -effective (and 
there were no differences between blocks) any of the 4 14 !/8 ! = 70 possible 
arrangements of the eight figures Into two groups of 4 would have the same 
probability, P = 1/70, of occurring. It is only because the particular arrange- 
ment, with no overlap between A and B, corresponds to a preconceived idea, 
that it is thought unusual, and what constitutes 'correspondence with a pre* 
conceived idea' may be arguable. The problem is an old one: 

\ . . . when Dr Beat tie observed, as something remarkable which had happened to him, 
that he had ohanced to see both No. 1 and No. 1000, of the hackney -coaches, the first 
and the last; "Why, Sir, (said Johnson,) there is an equal ohanoe for one'* seeing those 
two numbers as any other two". He was clearly right; yet the teeing of the two extreme*, 
each of which is in some degree more oonepicuoiu than the rest, could not but strike 
one in a stronger manner than the sight of any other two numbers' 

(Boewell'B Life of Johnton) 

The only safe general rule that can be offered at the moment is to analyse the 
experiment as a paired experiment if it was designed In that way. In other words 
take P = 1/8 in the present case; not much evidence against the null hypothesis. 
The problem is, however, a complicated one that is still not fully solved, f 



10.6. The paired t test 

As for the two sample t test ( § 9.4), it is neoeeaary to assume that the 
distributions of responses to the two treatments are Gaussian (normal) in 
form (see §§ 4.2 and 4.6), but it is no longer neoeeaary to assume that they 
have identical variances. The assumptions are explained in more detail in 
§ 1 1.2 and it would be preferable always to write the calculations in the 
form of analysis of variance as described in §11.6. 

The method will be applied to the results in Table 10.1 which have 
already been analysed properly in §§ 10.2-10.4 (and which were 
analysed as though the two samples were independent in Chapter 
9) although there is no evidence that the assumptions are fulfilled. 

The analysis is carried out on the differences, d = y B — y A . The 
variance of the differences is estimated to be, using (2.6.2) and (2.6 5), 

M W-J ) 2 38-68-(lS8) a /10 

= 1-513 with (n— 1) = 9 degrees of freedom. 

f In some oases, such ss this one, when the smallest P value (given, in this ease by 
the Wiloozon two-eample test) ia smaller than the P value that the other tests under 
consideration can ever reach, however large the difference between the treatments, 
Stone (1969) has argued that it is proper to quote the smaller P, i.e. P < 1/35 for the 
results in Table 10.6.1. 

Stone's method also introduces another factor of 1/2, i.e. be takes P < 1/70, but this 
as not yet oome into wide use. 



Copyrighted material 



168 Numerical and rank measurements § 10.6 

And the variance of the mean difference is, by (2.7.8), 
rtd] 1-513 

a*[d] = = — — = 01513 with 9 degrees of freedom, 
n 10 

This should be carefully distinguished from the variance of the 
difference between means found in § 9.4, which was larger (0-7209) and 
had more degrees of freedom (18). The disappearance of 9 degrees of 
freedom will be explained when the results are looked at from the point 
of view of the analysis of variance in § 11.6. 

The standard deviation of the mean difference is estimated as 

s[d] = VO-1513 = 0-3890 

and the null hypothesis is that the population (true) mean difference, 
u, is zero. Thus, from (4.4.1), 

d-u d d 



In this example 



1-58 

i = — — = 4-062. 
0-3890 



Referring to a table of the t distribution with n-1 = 9 degrees of 
freedom shows that a value of t (of either sign) as large as, or larger 
than 4-062 would be seen in less than 0-5 per cent of trials if the null 
hypothesis that the population (true) mean difference u — 0 were 
true, and if the assumptions of normality, etc. were true, i.e. P (two 
tail) < 0-005. This strongly suggests that the null hypothesis is not in 
fact true and that there is a real difference between the means. 

This result is rather different from that found in § 9.4 and the other 
sections in Chapter 9, when the same results were analysed as though 
the drugs had been tested on independent samples, and the reasons for 
this are discussed in §§ 10.7, 11.2, 11.4, and 11.6. It is in reasonable 
agreement with the other analyses in this chapter but it cannot be 
assumed that the t test will always give similar results to the more 
assumption-free methods. 

As in § 9.4, the same conclusion could be reached by calculating 
confidence limits for the mean difference. The 99-5 per cent confidence 

f Fop example, Fiaber and Yates (1983), Table 3, or Pearson and Hartley (19*6), 
Table 12. Only the latter has P = 0 005 values. See $ 4.4 for detail*. 



Copyrighted material 



$10.6 



Two related samples 169 



limits for u would be found not to inolude zero, the null hypothetical 
value, but the 99-8 per oent confidence limits do inolude zero. 

10.7. When will related samples (peiring) be en adventege? 

In Chapter 9 the results in Table 9.2.4 and § 10.1 were analysed by 
several different methods and in no case was good evidence found 
against the hypothesis that the two drugs were equi -effective. The 
methods all assumed that the measurements had been made on two 
independent samples of ten subjects each. In § 10.1 it was revealed 
that in fact the measurements were paired and the same results were 
reanalysed making proper allowance for this in §§ 10.2-10.6. It was then 
found that the evidence for a difference in effectiveness of the drugs 
was actually quite strong. Why is this ? On commonsense grounds the 
difference between responses to A and B is likely to be more consistent 
if both responses are measured on the same subject (or on members of 
a matched pair), than if they are measured on two different patients. 
This can be made a bit less intuitive if the correlation between the 
two responses from each pair is considered. The correlation coefficient, 
r (which is discussed in § 12.9, q.v.), is a standardized measure of the 
extent to which one value (say y K ) tends to be large if the other (y B ) is 
large (in so far as the tendency is linear, see § 12.9). It is closely related 
to the oovarianoe (see § 2.6), the sample estimate of the correlation 
coefficient being r = cov[y A , y B )l{s[y A la[y B ]). 

Now in § 9.4 the variance of the difference between two means was 
found as r^A-fo] = ^aR^b] using (2.7.3), which assumes that 
the two means are not correlated (which will be so if the samples are 
independent). When the samples are not independent the full equation 
(2.7.2) must be used, viz. 

•Tfl = ^[y B ~g A ] = **\y K }+**\y*}-* cov[y A , y B ].f 

Using the correlation coefficient (r) this can be written 

**\$k-Vb\ = ^ A ]+^ B ]-2r.s[y A ].«[y B ]. (10.7.1) 

(This expression should ideally contain the population correlation 
coefficient. If an experiment is carried out on properly selected in- 
dependent samples, this will zero, so the method given in § 9.4, which 
ignores r, is correct even if the sample correlation coefficient is not 
exactly zero.) 

These relationships show that if there is a positive correlation between 

| There mo equal numbers in each -ample so ($ A -$ B ) - (y A -y«) - d - 



Copyrighted material 



170 Numerical and rank measurements 



the two responses of a pair (r > 0) the variability of the difference 
between means will be reduced (by subtraction of the last term in 
(10.7.1)), as intuitively expected. In the present example r = -f 0-8, 
and ^[Pa— § B ] is reduced from 0-7210 when the correlation is ignored 
(§§9.4 and 11.4), to 0-1513 when it is not (§§ 10.6 and 11.6). The 
correct value is 01513, of course. 

Although correlation between observations is a useful way of looking 
at the problem of designing experiments so as to get the greatest 
possible precision, this approaoh does not extend easily to more than 
two groups and it does not make dear the exact assumptions involved 
in the test. The only proper approaoh to the problem is to make clear the 
exact mathematical model that is being assumed to describe the 
observations, and this is discussed in § 11.2. 



Copyrighted material 



11. The analysis of variance. How to 
deal with two or more samples 



1 . . . when I come to "Evidently" I know that It means two hours hard work at 
least before I can see way.' 

Why W. 8. Gossrt {'Student') 

in letter dated June 1922, to R. A. Fisher 



11.1. Relationship between various methods 

The methods in this chapter are intended for the comparison of two or 
more (k in general) groups. The methods described in Chapters 9 and 
10 are special oases (for k — 2) of those to be described. The rationale 
and assumptions of the two sample methods will be made clearer 
during discussion of their k sample generalizations. The principles 
discussed in Chapter 6, although some were put in two-sample language, 
all apply here. 

All that any of the methods will tell you is whether the null hypo- 
thesis (that all k treatments produce the same effect) is plausible or 
not. If this hypothesis is not acceptable, the analysis does not say 
anything about which treatments differ from which. Methods for 
comparing all possible pairs of the k treatments are described in 
§ 1 1.0 and, references are given to other multiple comparison methods. 

When the samples are independent (as in Chapter 9) the experimental 
design is described as a one-way classification because each experi- 
mental measurement is classified only according to the treatment 
given. Analyses are described in §§11.4 and 11.5. When the samples 
are related, as in Chapter 10, each measurement is classified according 
to the treatment applied and also ao cording to the particular block 
(patient in Chapter 10) it occurs in. Such two-way classifications are 
discussed in §§ 11.6 and 11.7. 

As in previous chapters nonparametrio methods based on the 
randomization principle (see §§ 6.2, 6.3, 8.2, 8.3, 9.2, 9.3, 10.2, 10.3, and 
10.4) are available for the simplest experimental designs. As usual, 
these experiments can also be analysed by methods involving the 



172 The analysis of variance § U.I 

assumption (among others, see § 11.2) that experimental errors follow a 
normal (Gaussian) distribution (see § 4.2), but the non parametric 
methods of §§ 11 .5 and 11.7 should be used in preference to the Gaussian 
ones usually (see § 6.2). Unfortunately , nonparametric methods are not 
available (or at least not, at the moment, practicable) for analysis of 
the more complex and ingenious experimental designs (see § 11.8 and 
later chapters) that have been developed in the context of normal 
distribution theory. For this reason alone most of the chapters following 
this one will be based on the assumption of a normal distribution (see 
§§4.2 and 11.2). 

When comparing two groups, the difference between their means or 
medians was used to measure the discrepancy between the groups. 
With more than two it is not obvious what measure to use and because 
of this it will be useful to describe the normal theory tests (in which a 
suitable measure is developed) before the nonparametric methods. 
This does not mean that the former are to be preferred. Tests of 
normality are disoussed in § 4.6. 

11.2. Assumptions involved in the analysis of variance based on 
the Gaussian (normal) distribution. Mathematical models 
for real observations 

It was mentioned in § 10.7 that in order to see clearly the underlying 
principles of the t test (§ 9.4) and paired t test (§ 10.6) it is necessary 
to postulate that the observations can adequately be described by a 
simple mathematical model. Unfortunately it is roughly true that the 
more complex and ingenious the experimental design, the more 
complex, and less plausible, is the model (see § 11.8). 

In the case of the two-sample t test (§§9.4 and 11.4) and its k sample 
analogue it is assumed that (1) the observations are normally distri- 
buted (§ 4.6), (2) the normal distributions have population means /i f 
(j = 1,2,...,*) which may differ from group to group (e.g. ^ for drug 
A and ^ for drug B in § 9.4), (3) the population standard deviation, 
o, is the same for every sample (group, treatment), i.e. a x = a 2 = ... 
= a k = or, say (it was mentioned in § 9.4 that it had to be assumed 
that the variability of responses to drug A was the same as that of 
responses to drug B), and (4) the responses are independent of each 
other (independence of responses in different groups is part of the 
assumption that the experiment was done on independent samples, but 
in addition the responses within each group must not affect each other 
in any way, cf. discussion in § 13.1, p. 286). 



Copyrighted material 



§11.2 



How to deal with turn or more samples 173 



The additive model 

These assumptions can be summarized by saying that the vth 
observation in the jth group (e.g. jth drug in § 9.4) can be represented 
as y if = u f +e lt (i = 1,2,...,*,, where n f is the number of observations 
in the jth group — reread § 2.1 if the meaning of the subscripts is not 
clear). In this expression the u } (constants) are the population mean 
responses for the jth group, and e if , a random variable, is the error 
of the individual observation, i.e. the difference between the individual 
observation and the population mean. It is assumed that the e t/ are 
independent of each other and normally distributed with a population 
mean of zero (so in the long runf the mean y tf — u t ) and standard 
deviation a. Usually the population mean for the jth group, u jf is written 
as f_i | t, where w is a constant common to all groups and r ) is a con- 
stant (the treatment effect) characteristic of the jth group (treatment). 
The model is therefore usually written 



The paired t test (§§10.6 and 11.6), and its k sample analogue 
(§11.6), need a more elaborate model. The model must allow for the 
possibility that, as well as there being systematic differences between 
samples (groups, treatments), there may also be systematic differences 
between the patients in § 9.4, i.e. between blocks in general (see § 11.6). 
The analyses in § 11.6 assume that the observation on the jth sample 
(group, treatment) in the tth block can be represented as 



where fi is a constant common to all observations, /?, is a constant 
characteristic of the ith block, r f is a constant, as above, characteristic 
of the jth sample (treatment), and e i} is the error, a random variable, 
values of which are independent of each other and are normally distri- 
buted with a mean of zero (so the long run average value of y u is 
and standard deviation a. This model is a good deal more 
restrictive than (11.2.1) and its implications are worth looking at. 
Notice that the components are supposed to be additive. In the case 
of the example in § 10.6, this means that the differences between the 
responses of a pair (block in general) to drug A and drug B are supposed 
to be the same (apart from random errors) on patients who are very 
sensitive to the drugs (large 0,) as on patients who tend to give smaller 

f In the notation of Appendix 1, E(e) - 0 »o E(y) = E(/i,) + E(«) = ft, from (11.2.1). 
And E(y) = E0*)+E(^)+E(T,)+E(e) = ,i + p\+ r, from (11.2.2). 



(11.2.1) 



(11.2.2) 



Copyrighted material 



174 The analysis of variance 



§11.2 



responses (small 0,). Likewise the difference in response between 
any two patients who receive the same treatment, and are therefore in 
different pairs or blocks, is supposed to be the same whether they 
receive a treatment (e.g. drug) producing a large observation (large r f ) 
or a treatment producing a small observation (small r f ). These remarks 
apply to differences between responses. It will not do if drug A always 
gives a response larger than that to drug B by a constant percentage, 
for example. 

Consider the first two pairs of observations in Table 10.1.1. In 
the notation just denned (see also § 2.1) they are y n ~ +0-7, y ia 
= +1-9, y ai = —1-6, and t/ a3 = +0-8. The first difference is assumed, 
from (11.2.2), to be y 12 ~y 11 = (/*+A+T a +«ia)-(H-A+ti+«n) 
= (r a — TjJ-Heja— «n) = +1*2. That is to say, apart from experimental 
error, it measures only the difference between the two treatments 
(drugs), viz. r 2 — t x whatever the value of ft. Similarly, the second 
difference is y^— y ai = (t 2 — 11)+ (e^— e^) = 2-4, which is also an 
estimate of exactly the same quantity, r a — r lf whatever the value of /? a , 
i.e. whatever the sensitivity of the patient to the drugs. 

Looking at the difference in response to drug A (treatment 1) between 
patients (blocks) 1 and 2 shows y n — y ai = (^H-ft+Ti+en)— (/x+0 a 
= (fc-M+(«ii-«M) = +0-7-(-l-6) = +2-3, and sim- 
ilarly for drug B y ia — y aa = (ft — 0 3 )+(e ia — eaa) = 1-9— 0-8 = 1-1. 
Apart from experimental error, both estimate only the difference 
between the patients, which is assumed to be the same whether the 
treatment is effective or not. 

The best estimate, from the experimental results, of t 2 —t 1 will be 
the mean difference, d = 1*58 hours. If the treatment effect is not the 
same in all blocks then block x treatment interactions are said to be 
present, and a more complex model is needed (see below, § 11.6 and, 
for example, Brownlee (1966, Chapters 10, 14, and 15)). 

This additive model is completely arbitrary and quite restrictive. 
There is no reason at all why any real observations should be repre- 
sented by it. It is used because it is mathematically convenient. In the 
case of paired observations the addivity of the treatment effect can 
easily be checked graphically because the pair differences should be 
a measure of r 2 —r l% as above, and should be unaffected by whether 
the pair (patient in Table 10.1.1) is giving a high or a low average 
response. Therefore a plot of the difference, d = y A —y B , against the 
total, y A -\-y B , or equivalently, the mean, for each pair should be apart 
from random experimental errors, a straight horizontal line. This 



Copyrighted material 



§ 1 1.2 How to deal with two or more samples 175 

plot, for the results in Table 10.1.1, is shown in Fig. 11.2.1. No system- 
atic deviation from a horizontal line is detectable with the available 
results but there are not enough observations to provide a good test 
of the additive model. For methods of checking additivity in more 
complex experiments see, for example, Bliss (1967, p. 323-41). 



© 



0 



© 



-2 -I 



1 2 



8 9 



10 



Fio. 11.2.1. Tee* of additive model for two way analysis of variance with 
two samples (i.e. paired t test). Pair differences are plotted against pair sum 

(or pair mean). 



Homogeneity of error 

In the models for the Gaussian analysis of variance all random errors 
are pooled into the single quantity, represented by e in (11.2.1) and 
(11.2.2), which is supposed to be normally distributed with a mean of 
zero and a variance of cr 2 . In other words, if observations could be 
made repeatedly using a given treatment (e.g. drug) and blook (e.g. 
patient) the scatter of the results would be the same whatever the 
size of the observation and whatever treatment was applied. This means 
that the scatter of the observations must be the same for every group 
(sample, treatment) for experiments with independent samples, 
represented by (11.2.1). 



i 

Copyrighted material ' 



176 The analysis of variance 



§ 11.2 



To test whether the variance estimates calculated from each group 
can reasonably all be taken to be estimates of the same population 
value, a quick test is to calculate the ratio of the largest variance to 
the smallest one, Cw/im- This test assumes that the k samples are 
independent and that the variation within each follows a normal 
distribution. Under these conditions the distribution of a^x/^un * 8 
known (when = 2 it is the same as the variance ratio, see § 1 1.3). For 
the results in Table 11.4.1 it is seen that k = 4, *Lx/*Ln = 158-14/ 
34-29 = 4-61, and each group variance is based on 7—1 = 6 degrees of 
freedom. Reference to tables (e.g. Biometrika tables of Pearson and 
Hartley (1966 pp. 63-7 and Table 31)) shows that a value of 
«Lin of 10-4 or larger would be expected to occur in 5 per cent of 
repeated experiments if the 4 independent samples of 7 observations 
were all from a single normally distributed population, and therefore 
all had the same population variance— the null hypothesis. Thus 
P > 005 and there are no grounds for believing that population 
variance is not the same for all groups, though the test is not very 
sensitive. The tables only deal with the case of k groups of equal size. 
If the sizes are not too unequal the average number of degrees of 
freedom can be used to get an approximate result. 

Use of transformations 

If the original observations do not satisfy the assumptions, some 
function of them may do so, although you will be lucky if you have 
enough observations to find out which function. Aspects of this problem 
are discussed in Bliss (1967 pp. 323-41), §§ 4.2, 4.5, 4.6 and 12.2. 

For example, suppose the observations (1) were known to be log- 
normally distributed (see § 4.5) and (2) were represented by a multi- 
plicative model (e.g. one treatment always giving say 50 percent greater 
response than another, rather than a fixed increment in response) and 
(3) had standard deviations that were not constant, but which were 
proportional to the treatment mean (i.e. each treatment had the same 
coefficient of variation, eqn (2.6.4)). In this case the logarithms of the 
observations would be normally distributed with constant standard 
deviation, and would be represented by an additive model. The 
constancy of the standard deviation follows from eqn (2.7. 14). Therefore 
the logarithm of each observation would be taken before doing any 
calculations. 

If the standard deviation for each treatment group is plotted against 
the mean as in Fig. 11.2.2 the line should be roughly horizontal. 



Copyrighted material 



§ 1 1.2 How to deal with two or more samples 177 

This can be tested rapidly using e^Js^ as above, given normality. 
If it is a straight line passing through the origin then the coefficient 
of variation is constant and the logs of the observations will have 
approximately constant standard deviation as just described. If the 
line is straight, but does not pass through the origin, as shown in Fig. 
11.2.2, then y 0 — ajb (where a is the intercept and b the slope of the 




Fig. 1 1.2.2. Transformation of observations to a scale fulfilling the require- 
ment of equal scatter in each treatment group. See text. 

line, as shown) should be added to each observation before taking 
logarithms. It will now be found that log (y+y Q ) has an approximately 
constant standard deviation, though this is no reason to suppose that 
this variable fulfils the other assumptions of normality and additivity. 

It is quite possible that no transformation will simultaneously 
satisfy all the assumptions. Bartlett (1947) discusses the problem. 
A more advanced treatment is given by Box and Cox (1964). In the 
discussion of the latter paper, Nelder remarks 'Looking through the 
oorpua of statistical writings one must be struck, I think, by how 
relatively little effort has been devoted to these problems [oheoking of 
assumptions]. The overwhelming preponderance of the literature 
consists of deductive exercises from a priori starting points . . . 
Frequently these prior assumptions are unjustifiably strong and amount 



Copyrighted material 



1 78 The analysis of variance 



§11.2 



to an assertion that the scale adopted will give the required additiyity 
etc.' A good discussion is given by Bliss (1987 pp. 323-41). 

What sort of model is appropriate for your experiment— fixed effects or 
random effects ? 

In the discussion above, it was stated that the values of /3, and r y 
were constants characteristic of the particular blocks (e.g. patients) 
and treatments (e.g. drugs) used in the experiment. This implies that 
when one speaks of what would happen in the long run if the experiment 
were repeated many times, one has in mind repetitions carried out with 
the same blocks and the same treatments as those used in the experi- 
ment (model 1). Obviously in the case of two drugs the repetitions would 
be with the same drugs, but it is not so obvious whether repetitions 
on the same patients are the appropriate thing to imagine. It would 
probably be preferred to imagine eaoh repetition on a different set of 
patients, each set randomly selected from the same population, and in 
this oase the 0, will not be constants but random variables (model 2). 
It is usually assumed that the 0 t , as well as the e w , are normally dis- 
tributed, and the variance of their distribution (a» a , say) will then 
represent the variability of the population mean response for individual 
patients (blooks) about the population response for all patients (a> 2 is 
assumed to be the same for all treatments). Compare this with a 2 , 
which represents the variability of the individual responses of a patient 
about the true mean response for all responses on that patient (a 3 is 
assumed to be the same for all patients and treatments). 

The distinction between models based on fixed effects (model 1) 
and those based on random effects (model 2) will not affect the inter- 
pretation of the simple analyses described in this book as long as the 
simple additive model (such as (11.2.2)) is assumed to be valid, i.e. there 
are no interactions. But if interactions are present, and in more complex 
analyses of variance, it is essential for the interpretation of the result 
that the model be exactly specified because the interpretation will depend 
on knowing what the mean squares, whioh are all estimates of a 2 when 
the null hypothesis is true, are estimates of when the null hypothesis is 
not true. In the language of Appendix 1, the first thing that must be 
known for the interpretation of the more complicated analyses of 
variance is the expectations of the mean squares. These are given, for 
various common analyses, by, for example, Snedecor and Coohran 
(1967, Chapters 10-12 and 16) or Brownlee (1965 Chapters 10, 14, and 
15). See also §11.4 (p. 186). 



Copyrighted material 



§11.3 



How to deal loith two or more samples 1 79 



11 .3. The distribution of the variance rstio F 

This section describes the generalization of Student's t distribution 
( § 4.4) that is necessary to extend the two-sample normal distribution* 
based tests (see §§ 9.4 and 10.6) to more than two (k in general) samples 
of observations with normally distributed errors (see §§ 11.4 and 11.6 
and later chapters). The t tests of §§9.4 and 10.6 will be special cases 
{k = 2) of the more general methods. 

In the case of the t test for two independent samples (§ 9.4), the 
discrepancy between the samples was measured by the difference 
between the sample means, and if this was large enough, compared 
with experimental errors, the null hypothesis that the two population 
means were equal, i.e. that both samples were samples from the same 
parent population, with variance or 2 , was rejected. If it is required to 
test the hypothesis that more than two samples are all from the same 
population, and therefore have the same mean and variance, a 3 , the 
first step is to find a measure of the discrepancy between the k sample 
means, to take the place of the simple difference that was used when 
k = 2. The usual measure of scatter in normal distribution theory is the 
standard deviation, and it turns out that the standard deviation of the 
k sample means is a suitable generalization of the difference between 
two means. 

The sensiblene8s of this is made apparent when it is realized that the 
difference between two figures is their standard deviation, apart from a 
constant. Consider two observations, y x and y 2 . What is their standard 
deviation? The variance is 2 (y—y) a /(»—l ) = 2(y — ^(because n = 2). 



By (2.6.5) this is 2(y-0) a = 2y-(2&)»/» = y? + j£- (yi+y a ) 2 /2 
- y?+s£-(y?+y2+2y iya )/2 = (y a +y3-2 yi y a )/2 = ( yi -y a ) a /2. The 



standard deviation of the two figures is the square root of this, viz. 



where the subscript 1 indicates that this is a standard deviation based 
on one (= n— 1) degree of freedom. Now t is defined (see §§ 4.4, 9.4, and 
10.6) as {x—u)j8 f [x\ where x is normally distributed, with mean u, and 
sample standard deviation 8 f [x] based on, say, / degrees of freedom 
(/ = 18 in § 9.4). If the variable of interest is x = y x — y 2 , and one 
wishes to test the null hypothesis that the population valuesf of y t and 
y 2 are equal, i.e. p = 0, then one calculates t — (yj— y a )Myi— VaL Now 

t In the language of Appendix I, ft = Ef yi -y 9 ] - Efol-ECyJ = 0. 



(11.3.1) 



Copyrighted material 



180 The analysis of variance 



§ 11.3 



y x and y 2 are assumed to be independent, and both are assumed to 
have the same variance, of which an estimate sfy] based on / degrees of 
freedom is available (calculated, for example, from the variability 
within groups as in § 9.4 if t/ x and y 2 were group means). The estimated 
variance of y x — y 2 is therefore, by (2.7.3), 8 3 {y l —y 2 ] — J^il+^a] 
— 2*J[y], so s[y 1 — y 2 ] = y/l8 f \y\. Using these values gives 

by (11.3.1). (11.3.2) 

Thus, if the null hypothesis (p — 0) is true, t 2 is seen to be the ratio of 
two independent estimates of the same population variance a 2 (see 
above), that in the numerator having one degree of freedom (in § 9.4 
this was found from the difference between the sample means, y A and 
p a ), and that in the denominator having / degrees of freedom (in 
§ 9 4 this was found from the differences within samples). Compare this 
approach with the discussion of predicted variances in § 2.7. In §§ 9.4 
and 11.4 it is predicted from the observed scatter within samples what 
the scatter between samples could reasonably be expected, and the 
prediction compared with the observed scatter between samples. 

Now the ratio of two independent estimates of the same population 
variance is called the variance ratio and is denoted F (after R. A. Fisher 
who discovered its distribution when the population is normally 
distributed). If the estimate in the numerator has j Y degrees of freedom, 
and the estimate in the denominator has / a degrees of freedom then F 
is denned as 

From (11 .3.2) it is immediately seen that t 2 with / degrees of freedom is 
simply the special casef of the variance ratio with f x = 1 degree of 
freedom in the numerator, i.e. *} = g^s*, = F{iJ). Because the 
variance in the numerator can be found from (and used as a measure of 
the discrepency of) k sample means, F is the required generalization of 
Student's t. Numerical examples occur in §§11.4 and 11.6 

t It is worth noting, in passing, that chi -squared distribution is another special case 
of the variance ratio. Since )f with / degrees of freedom is the distribution of / s"/®* 
(see § 8.5) it follows that X 2 I/ «» "imply F(/,<z>). the population variance a 1 being an 
estimate with to d.f. 



Copyrighted material 



§ 11.3 



How to deal with two or more samples 



181 



Imagine repeated samples drawn from a single population of normally 
distributed observations. From a sample of /i+l observations the 
sample variance, sj, is calculated as an estimate of the population 
variance (with / 2 degrees of freedom). Another independent sample 
of f 2 {- 1 observations is drawn from the same population and its sample 
variance, 8 f 2 3 , is also calculated. The ratio, F, of these two estimates 
of the population variance is calculated. If this process were repeated 
very many times the variability of the population of F values so 




10 per cent of area 



>'(4.8) 



Fio. 1 1.3.1. Distribution of the variance ratio when there are 4 degrees of 
freedom for the estimate of t 2 in the numerator and 6 degrees of freedom for 
that In the denominator. In 10 per cent of repeated experiments in the long run, 
the ratio of two such estimates of the some variance (null hypothesis true) will be 
3*18 or larger. The mode of the distribution is less than 1, and the mean is i 

than 1. 



produced would be described by the distribution of the variance ratio, 
tables of which are available. t The tables are more complicated than 
those for the distribution of t because both f l and / a must be specified 
as well as F and the corresponding P, so a three-dimensional table is 
needed. An example of the distribution of F for the case when f x = 4 

t For example, (a) Fisher and Yates (1968), Table V. In theae tables values of F are 
given for P = 0»001, 0-01, 0-06, 0*1, and 0*2 (the '0-1 per cent, etc. percentage point*' 
of F). The degrees of freedom f x and/ a are denoted «h and n,, and the variance ratio F is, 
for largely historical reasons, called e" (the tables of t on the facing pages should be 
ignored), (b) Pearson and Hartley (1968), Table 18 give values of F tot P — 0 001 
0 005, 0-01, 0 025, 0 06, 01, and 0-26. The degrees of freedom are denoted •» and a* 



Copyrighted material 



182 The analysis of variance 



§ 113 



and f 2 = 6 is shown in Fig. 11.3.1- Reference to the tables shows that 
in 10 per cent of repeated experiments F will be 3.18 or larger, as 
illustrated. The distribution has a different shape for different numbers 
of degrees of freedom, but it is always positively skewed so mode and 
mean differ (see § 4.5). Since numerator and denominator are estimates 
of the same quantity values of F would be expected to be around one. 

As in the case of (see § 8.6), deviations from the null hypothesis in 
either direction tend to increase the value of F (because squaring 
makes all deviations positive), so the area in one tail of the F distribu- 
tion, as in Fig. 11.3.1 is appropriate for a two-tail test (see § 6.1) of 
significance in the analysis of variance. This can be seen clearly in the 
case of the t test. In § 9.4 it was found that the probability that f 18 will 
be either less than —2-101 or greater than +2.101 was 0-06. Either of 
these possibilities implies that t& ^ 2-101 2 , i.e. ^(1,18)^4-41. 
Reference to the tables of F with f x = 1 and / a = 18 shows that 
F = 4-41 cuts off 6 per cent of the area in the upper tail of the distribu- 
tion, the same result as the two-tail t test. 

What to do if the variance ratio is less than one 

When the null hypothesis is true F would be expected to be less than, 
one quite often but the tables only deal with values of one or greater 
In this case look up the reciprocal of the variance ratio which will now 
have / 3 degrees of freedom for the numerator and f x for the denominator. 
The resulting value of P is the probability of F equal to or less than the 
original observed value. For example, if f (4,6) = 0-25, then look up 
^(6,4) = 1/0-25 = 4-0 with 6 d.f. for the numerator and 4 for the 
denominator. From the tables it is found that P[F{G,4) ^ 4-0] = 01, 
and therefore P[F(4,Q) < 0-25] = 01. 10 per cent of the area lies 
below F = 0-25 in Fig. 1 1.3.1. The probability required for the analysis 
of variance is P[F{4,$ > 0-25] =1—01 = 0-9. If a variance ratio is 
observed that is so small as to be very rare, it can only be assumed that 
either a rare event has happened, or else that the assumptions of the 
analysis are not fulfilled. Deviations from the null hypothesis can only 
result in a large variance ratio. 

11.4. Gaussian analysis of variance for k independent samples 
(the one way analysis of variance). An illustration of the 
principle of the analysis of variance 

The use of the variance ratio distribution to extend the method of 
§ 9.4 to more than two samples will be illustrated using observations on 



Copyrighted material 



§ 11.4 How to deal with two or more samples 183 

the blood sugar level (mg/ 100ml) of 28 rabbits shown in Table 11.4.1. 
As usual, the rabbits are supposed to be randomly drawn from a 
population of rabbits, and divided into four groups in a strictly random 
way (see § 2.3). One of the four treatments (e.g. drug, type of diet, or 
environment) is assigned to eaoh group. Is there any evidenoe that the 
treatments affect the blood sugar level? Or, in other words, do the 

Table 11.4.1 

Blood sugar level, (ma/lOOml)— 100, in four groups of seven rabbits. 
See § 2.1 for explanation of notation. The figures in parentheses are the 
ranks and rank sums for use in § 11.5 





Tn 


jatment {j) 






1 

1 


o 


a 


A 
* 




17 (10J) 


37 (26) 


36 (22*) 


9 (6) 




16 (9) 


36 (24) 


22 (16) 


8 (4) 




28 (18) 


21 (13|) 


36 (22 J) 


17 (10i) 




4 (3) 


13 (7) 


38 (26) 


18 (12) 




21 (13*) 


46 (28) 


31 (19) 


1 (2) 




0 (1) 


23 (16|) 


34 (20|) 


34 (20i) 




23 (16i) 


13 (7) 


40 (27) 


13 (7) 




Total 








Grand total 










fc Hi 


T. t - 2 7 y„ 109 (71-6) 


188 (121) 


236 (162-6) 


100 (61) 


o = X tvt, 




















= 682 


Mean 

- Ll 16-671 








Grand mean 


26-867 


33-671 


14-286 


y.. = 632/28 










- 22-67U 


Varianc « 10296 
*t 


168-14 


34 29 


109-24 





four mean levels differ by more than reasonably could be expected if the 
null hypothesis that all 28 observations were randomly selected from 
the same population (so the population means are identical) were true ? 

The assumptions concerning normality, homogeneity of variances, 
and the model involved in the following analysis have been discussed in 
§11.2 which should be read before this section. Although the largest 
group variance in Table 1 1.4.1 is 4*6 times the smallest, this ratio is not 
large enough to provide evidence against homogeneity of the variances, 
as shown in § 11.2. Tests of normality are discussed in § 4.6. 



Copyrighted material 



184 The analysis of variance 



§ H.4 



The non par a metric method described in § 11.5 should generally be 
preferred to the methods of this section (see § 6.2). 

The following discussion applies to any results consisting of k 
independent groups, the number of observations in the jth group being 
n, (the groups do not have to be of equal size). In this case k — 4 and 
all n f = 7. The observation on the t'th rabbit and jth treatment is 
denoted y u . The total and mean responses for the jth treatment are 
T j and y f . See § 2.1 for explanation of the notation. 

The arithmetic has been simplified by subtracting 100 from every 
observation. This does not alter the variances calculated in the analysis 
(see (2.7.7)), or the differences between means. 

The problem is to find whether the means (y. f ) differ by more than 
could be expected if the null hypothesis were true. There are four 
sample means and, as discussed in § 11.3, the extent to which they differ 
from each other (their scatter) can be measured by calculating the 
variance of the four figures. 

The mean of the four figures is, in this case, the grand mean of all 
the observations (this is true only when the number of observations is 
the same in each group). The sum of squared deviations (SSD) is 

2V/-0..) a = (16*071 -22-571) a + (26-857 -22-571) 9 + 

+ (33-57 1 - 22- 67 1 ) a + ( 14- 286— 22-57 1 ) a 
= 257-02. 

The sum of squares is based on four figures, i.e. 3 degrees of freedom, 
so the variance of the group means, calculated directly (cf. § 2.7) from 
their observed scatter is 

257 

— ■ = 85-6733. 
4 — 1 

Is this figure larger than would be expected 'by chance', i.e. if the 
null hypothesis were true? It would be zero if all treatments had 
resulted in exactly the same blood sugar level, but of course this 
result would not be expected in an experiment, even if the null hypo- 
thesis were true. However, the result that would be expected can be 
predicted, because the null hypothesis is that all the observations come 
from a single Gaussian population. If the true mean and variance of 
this hypothetical population were /i and a 2 then group means, which are 
means of 7 observations, would be predicted (from (2.7.8)) to have 
variance a 3 /!. If another estimate of a 3 , independent of differences 



Copyrighted material 



§ 1 1.4 How to deal with two or more samples 185 

between treatments, were obtainable then it would be possible to see 
whether this prediction was fulfilled. How this is done will be seen 
shortly. 

With greater generality, suppose that k groups of n observations are 
to be compared. In the present example k = 4 and n = 7. If all the 
observations were from a single population with variance a 3 then the 
variance of group means should be aPjn, so the variance of the means 
(85-673) in the present example), calculated directly from their observed 
scatter about the grand mean should be an estimate of a 2 jn, calculated 
from differences between groups. Multiplying through by n, it therefore 
follows that 

»'iV/-y..) a 

i-i 

(4-1) 



would be an estimate of a 3 , calculated from differences between groups, 
if the null hypothesis were true. 

Thus in the present example 7x85*673 = 599' 71 is an estimate of 
a 3 . It can be shown that if the number of observations were not equal, 
say 71 f observations in the jth group, then this expression would become 



1~k 

k-i 



(11.4.1) 



However, if the population means of the groups were not all the 
same, i.e. if the null hypothesis were not true, then the above expression 
would not be an estimate of o*. Its numerator would be inflated by 
the real differences between means so, on the average, the estimate 
calculated from differences between groups would be an estimate of 
something larger than a 3 . The expectation of the between-groups mean 
square will be greater than o* if the null hypothesis is not true, see 
p. 186. 

To test whether this has happened an independent estimate of a 3 , 
not dependent on the assumption that the true (population) group 
means are equal, is needed. This can be obtained from differences within 
groups. The estimate of a 3 calculated from within the jth group — 
simply the estimated variance of the group — is, as usual, found by 



Copyrighted material 



186 The analysis of variance § 1 1.4 

summing the squares of deviations (SSD) of the individual observations 
in a group from the mean for that group. Thus, for the jth group, 

SSD within jth group iffoZ— 

degrees of freedom within jth group n f — 1 

Now, since all groups are assumed to have the same variance, the 
information about the variance from all groups can be pooled to give a 
single estimate. This is done by dividing the total sum of squares by the 
total degrees of freedom, as in § 9.4, so 

i-k 

2 (SSD within the jth group) 

*M = ^ j^-k 

I («y-l) 

y-i 

I 5(y w -S.,) a 



(using (2.1.8) and (2.1.7)), (11.4.2) 



N-h 

k 

where N = J n f is the total number of observations. This is the 
/-i 

required estimate of c 3 calculated from differences within groups. In 
the present example its value is 2427«7 14/(28-4) = 101-155. An easy 
method of calculating the numerator is given below. 

Furthermore, if all N observations were from the same population, 
a 2 could be estimated from the sum of squared deviations (SSD) of all 
of the N (= 28 in this case) observations from their mean (the grand 
mean y ). Thus, using (2.6.5), 

Total SSD = 22(y„-$..) a = 2 ^-(^y^N (11.4.3) 
= (17 2 +16 a -f ... + 34 2 +13 2 )-(632) a /28. 

Tabulation of the results 

It has been shown that (11.4.1) and (11.4.2) would both be estimates 
of a 2 if the null hypothesis were true, so their ratio would follow the 
F distribution (see § 11.3). If there is a difference between groups 
then (11.4.1) will be enlarged t and the observed F would therefore be, 

t The expectation (long-run mean, see Appendix 1) of MS W in Table 1 1.4.2 is E(M8 V ] 

k 

m a 3 , and in general E[M8 b ] = o*+ £ )»/(* — 1) for the fixed effect model 

/-l 

(11.2.1) discussed in § 11.2, bo if the null hypothesis, that all the r, are the same, is 
true, EfM8 b ] = a* also. For the random effects model (see p. 178) E[MS»] =» o'+rKo* 
when all groups are the same size (n). And the null hypothesis ie that or 1 — 0, in which 
again EfMS b ] = tr". (See Brown lee (1965, pp. 310 and 318).) 



Copyrighted material 



§ 1 1.4 How to deal with two or more samples 187 

on the null hypothesis, improbably large. It is shown below that the 
sums of squares in the numerators of (11.4.1) and (11.4.2) add up to 
the total sum of squares (1 1.4.3). Furthermore, the number of degrees 
of freedom for between groups (11.4.1) comparisons, k—l, and that for 

Table 11.4.2 

General one-way analysis of variance. Notice that if the null hypothesis 
were true all the figures in the mean square column would be estimates of 
the same quantity (cr 3 ). Mean square is just another name for variance 
{see §2.6). Putting in the figures in the example gives Table 11.4.3 



Source of Sum of Mean square Variance P 

variation d.f. squares (or variance) ratio 



Between 
groups 




S8D/d.f. - MS b F — MS^/MS. 


Within 
groups 


/ < 


S8D/d.f. = MS. 


Total 


JV-1 Hfab-fJP 




Table 11.4.3 

Analysis of the rabbit blood sugar level observations in Table 11.4.1 


Source of 
variation 


d.f. Sum of 
•quarea 


Mean Varianoe P 
aquare ratio 


Between 
treatment* 

Wi tibixi 


4-1 = 3 1799143 
28-4 - 24 2427 714 


1799 143 ,„„„ 69*71 MM 

= 599-71 = 5-93 < 0-005 

3 101-164 

2427-714 1A , , M 
24 


Total 


28-1 - 27 4228-857 





within groups (11.4.2) comparisons, Z(n,— 1) = N—k, add up to the 
total number of degrees of freedom. 

These results can be written out as an analysis of variance table. 
Ail analysis of varianoe tables have the same column headings, but the 
sources of variation considered depend on the design of the experiment. 
For the one-way analysis the general result is as in Table 11.4.2. 



Copyrighted material 



188 The analysis of variance 



§11.4 



If the null hypothesis were true 699*71 and 101*16 would both be 
estimates of the same variance (a 2 ). Whether this is plausible or not is 
found by referring their ratio, 5-93, to tables of the distribution of 
F (see § U.3) with f x = 3 and f 2 = 24 degrees of freedom. This 
shows (see Fig. 11.3.1) that a value of ^(3,24) = 5-93 would be exceeded 
in only about 0-5 per cent of experiments in the long run (if the assump- 
tions made are satisfied). Therefore, unless it is preferred to believe 
that a 1 in 200 chance has come off, the premise that both 599-71 and 
101*16 are estimates of the same variances will be rejected, and this 
implies that all the observations cannot have come from the same 
population, i.e. that the treatments really differ in their effeot on blood 
sugar level (see § 6.1). 

Notice that this does not say anything about which treatments 
differ from which others — whether all treatments differ, or whether 
three are the same and one different for example. The answering of this 
question raises some problems, and a method of doing it is discussed in 
§ 11.9. It is not correct to do t tests on all possible pairs of groups. 

A practical method for calculating the sum of squares 

A form of ( 1 1 .4. 1 ) that is more convenient for numerical computations 
can be derived along the same lines as (2.6.5). The SSD between groups, 
from (11.4.1), is 

InMj-9..)* = i (n,$., a -2n,y ») 
>-l i-i 

- 2 n f 9j 2 -29„ 2 nffij+f, 2 (H.4.4) 
y-i y-i y-i 

In this expression consider 

(a) the first term: because f 4 = TJn, (see Table 11.4.1 and § 2.1) 
the first term can be written 

(b) the second term: again substituting the definition of y., shows 

that the second term is 2y ..£»£., = fyZTj = 2$ w G = 2G 2 /N, 

because = sum of group totals = grand total, O t and = OjN\ 
i 

(c) the third term : because the sum of the group numbers 2 n i = 

i 



Copyrighted material 



§ 1 1.4 How to deal with two or more samples 189 

the total number of observations, and y = Q/N, the third term is 

i 

Substitution of these three results in (11.4.4) gives 

* IT , 9 \ (P 
SSD between groups = 2 \ „ / 

i.e., writing out the summation term by term, 

T\ T\ T\ (P 
SSD between groups = — +— +...H — --— » (11.4.5) 

and this is the usual working formula. The formula for the total sum 
of squares, (2.6.5), can be regarded as the special case in whioh each 
total T contains n = 1 observation. In the present example 

109 2 188 a 2352 100 2 (632) a 
SSD between groups - —+—+-—+— 

= 1799 143 

as shown in Table 11.4.3. 
The SSD within groups can now be found most easily by difference 

SSD within groups = total SSD-between-groups SSD (11.4.6) 

= 4226-857-1799-143 = 2427-714, 

as in Table 11.4.3. 

A digression to show thai sum of squares between and within groups must 
add up to the total sum of squares 

Consider the total sum of squared deviations (SSD) of the observations about 
the grand mean (11.4.1), namely E(y u -$..) a . At first consider the SSD of the 
in the j th group about the grand mean, i.e. 



2 > a = 2 Uvi, -9.,)+ i9.s-y..)? 
1-1 1-1 



= £ C(yi/-^) a + 2(y 1/ -l?. y )(j?. / -j?..)+(^-3?..) a ] 
i-i 

= 2 (yt,-$.,) 8 +2<$.,-0..) 2 i9it-9.,)+nM. t -9..P 



i-i i-i 

i-i 

The last step in this derivation follows from (2.6.1), which shows that £(yi/-9.j) 
= 0, so the central term disappears. * 



Copyrighted material 



190 The analysis of variance § 1 1 .4 

If the above result is summed over the k groups, the required result is obtained, 
thus 

2 %(Vt,-9.y - 2 £(V„-y ,) a + 2n,(y.,-sL)» (11.4.7) 
/-l i-i /- 1 (-1 /-i 

Total 88D =* 88D within +SSD between 

groups groups 

Thus is a purely algebraic result and must hold for any set of numbers, but 
unless the observations can really be represented by the postulated model 
(see i 11.2) the components will not have the simple interpretation implied in the 
of variation' column of Table 11.4.2. 



The t test on the results of Cushny and Peebles written as an analysis of 
variance 

The calculations in § 9.4 can usefully be written as an analysis of 
variance on the lines just described, with k = 2 independent groups. 
The results and necessary totals are given in Table 10.1.1. Again refer 
to § 2.1 if in doubt about the notation. 

The first step is usually to calculate 0 2 jN as it appears several times 
in the calculations. This quantity is often called the correction factor 
for the mean, because, from (2.6.5), it corrects Ey 2 to 2(y— $) 2 . 

From Table 10.1.1: 



(a) correction faotor 0 2 jN = (30-8) 2 /20 = 47-4320; (11.4.8) 

(b) total sum of squares, from (2.5.6) (cf. (11.4.3)), 

= 0«7 2 -f l-6 2 -f ...+3-4 2 -47-432 

= 77-3680; (11.4.9) 

(c) sum of squares between columns (i.e. between drugs A and B), 
calculated from the working formula (11.4.5), is 

i»,<S.,HL>' = ^+£- 5 -£ 

/-l n l n 2 & 

7-5 2 23-3 2 

= H 47-432 

10 T 10 

= 12-4820; (11.4.10) 

and, as above, when divided by its number of degrees of freedom 
this would give an estimate of cr 3 if the null hypothesis (that all 
observations are from a single population with variance a 2 ) 
were true ; 



Copyrighted material 



§11.4 How to deal with two or more samples 191 

(d) the sum of squares within groups can now be found by difference , 
as in (11.4.6), 

Z 2 iyti-9.i) 2 - 77-3680- 12-4820 = 64 8860. 

These results can now be assembled in an analysis of varianoe table, 
Table 11.4.4, which is just like Tables 11.4.2 and 11.4.3. 



Table 11.4.4 



Source of variation 


d.f. 


Sum of 
squares 


MS 


F P 


Between drugs 

Error (or within drugs) 


1 
18 


12-4820 
64-8860 


12-4820 
3-6047 


3-463 0-1-0-05 


Total 


10 


77-3680 







Reference to tables of the distribution of F (see § 11.3) shows that, if 
the assumptions discussed in § 11.2 are true, a value of Fi 1,18) equal 
to or greater than 3-46 would occur in between 5 and 10 per cent trials 
in the long run, if the null hypothesis, that the drugs are equi-effective, 
were true, and if the assumptions of normality, etc. were true. This 
is exactly the same result as found in § 9.4, and P is not small enough to 
reject the null hypothesis. Because there are only two groups (k — 2), 
F has one (= k—l) degree of freedom in the numerator, and is therefore 
(see § 11.3) a value of f a . Thus y/F = y/(3-4G3) = 1-861 is a value of 
t with 18 d.f., and is in fact identical with the value of t found in § 9.4. 

Furthermore, the error (within groups) variance of the observations, 
from Table 11.4.4, is estimated to be 3-605, exactly the same as the 
pooled estimate of a*{y) found in § 9.4. Table 1 1.4.4 is just another way 
of writing the calculations of § 9.4. 

11 .6. Nonparametrfc analysis of variance for independent 
samples by randomization. The Kruskal-Wallls method 

As suggested in §§6.2 and 11.1, the methods in this section are 
to be preferred to that of § 11.4. 

The randomization method 

The randomization method of § 9.2 is easily extended to k samples 
and this is the preferred method, because it makes fewer assumptions 
than Gaussian methods. The disadvantage, as in § 9.2, is that tables 



Copyrighted material 



192 The analysis of variance 



§11.5 



cannot be prepared to facilitate the test, which will therefore be tedious 
(though simple) to calculate without an electronic computer. As in 
§ 9.3, this can be overcome, with only a small loss in sensitivity, by 
replacing the original observations by their ranks, giving the Kruskal- 
Wallis method described below. 

The principle of the randomization method is exactly as in §§9.2, 
9.3, and 8.2 so the arguments will not all be repeated. If all four treat- 
ments were equi -effective then the observed differences between group 
means in Table 11.4.1, for example, must be due solely to the way the 
random numbers come up in the process of random allocation of a 
treatment to each rabbit. Whether such large (or larger) differences 
between treatments means are likely to have arisen in this way is 
again found by finding the differences between the treatment means 
that result from all possible ways of dividing the 28 observed figures 
into 4 groups of 7. On the assumption that, when doing the experiment, 
all ways were equiprobable, i.e. that the treatments were allocated 
strictly at random, the value of P is once again simply the proportion 
of possible allocations that give rise to discrepancies between treatment 
means as large as (or larger than) the observed differences. In § 9.2 the 
discrepancy between two means was measured by the difference 
between them. As explained in §§11.3 and 11.4, when there are 
more than two (say k) means, an appropriate thing to do is to measure 
their discrepancy by the variance of the k figures, i.e. by calculating, 
for each possible allocation the 'between treatments sum of squares' as 
described in § 11.4. 

An approximation to the answer could be obtained by card shuffling, 
as in § 8.2. The 28 observations from Table 11.4.1 would be written on 
cards. The cards would then be shuffled, dealt into four groups of 
seven, and the 'between treatments sum of squares' calculated. This 
would be repeated until a reasonable estimate was obtained of the 
proportion (P) of shufflings giving a sum of squares equal to or larger 
than the value observed in the experiment. In fact, just as in § 9.2 
it was found to be sufficient to calculate the total response for the 
smaller group, so, in this case, it is sufficient to calculate £T //n, for 
each possible allocation, because once this is known the between 
treatments sum of squares, or between-treatments F ratio, follows from 
the fact that the total sum of squares is the same for every possible 
allocation. 

By a slight extension of (3.4.3), the number of possible allocations of 
N objects into k groups of size n lt n 2 ,...n k , (En, = N) is-AM/^ !nj !...»*!). 



Copyrighted material 



§ 1 1.5 How to deal with two or more samples 193 

In the case of Table 11.4.1 there are therefore 28 !/(7 17 !7 17 1) 
=i 472518347568400 possible allocations. This is rather too many to 
enumerate by hand (doing one every 5 minutes it would take about 20 
thousand million normal working years), though it is easy to select a 
sufficiently large random sample of them with an electronic computer . 

If a program for this is not available, the recommended prooedure for 
analysis of k independent samples is, just as in § 9.3, to replace the 
observations by ranks allowing tables to be constructed. This is known 
as the Kruskal-Wallis method. 

The k sample randomization test on ranks. The Kruskal-Wallis one-way 
analysis of variance 

This is simply an extension to more than two {k, say) groups of 
the Wilooxon two-sample test (see above, and § 9.3). As before, the null 
hypothesis is that all N observations are from the same population, and 
if this is rejected the conclusion will be that the populations differ. If 
it is wished to conclude that the population medians differ then it must 
be assumed that the underlying distributions (before ranking) for all 
groups are the same apart from the medians, though no particular 



Table 11.5.1 



Score 


A 

Rank 


Treatment 
B 

Score Rank 


Score 


C 

Rank 


7 


4 


1 1 


17 


0 


16 


8 


6 2 


31 


11 


12 


6 


6 3 


14 


7 


8 


5 




22 


10 


Total* 


= 23 






«a = 37 



form of distribution is assumed except that it must be continuous (see 
§ 4.1). This implies that the variance is assumed to be the same for all 
groups. 

Again the method can be applied when the observations are them- 
selves not numerical measurements but ranks, ot arbitrary scores that 
must be reduced to ranks, as well as when they are numerical measure- 
ments. 

All N observations are ranked in ascending order, ties being given the 
average rank as in §9.3 (see Table 9.3.1). Table 11.5.1 shows the 
results of an experiment in which 1 1 patients were divided randomly 



Copyrighted material 



194 The analysis of variance 



§11.5 



into k = 3 groups, each being given a different analgesic drug (A, B, 
and C). In each group the figure recorded is the total subjective pain 
score recorded over a period of time by each patient. Such measure- 
ments should not be treated like numerical measurements but should 
be ranked. The ranks are shown in the table together with the rank 
sum, R p for each group (j — 1,2,...*,). The number of observations in 
each group is n, and the total number is N = En, as in § 11.4. 

The measure of the extent to which the treatments differ, analogous 
to the rank sum of the smaller sample used in § 9.3, is the statistic H 
defined as 

12 -r^S' 

as long as there are not too many ties (see below). Notice that the 
term 'LB^jn j makes H similar in character to a between-groups sum of 
squares (11.4.5). For the results in Table 11.5.1, N= 11, n x = 4, 
n a = 3, ng = 4, R 1 = 23, R 2 = 6, and R^ = 37. Applying the check 
(9.3.1) gives the sum of all ranks as N(N +l)/2 = 66 and in fact 23 
+ 6 + 37 = 66. Using these values with (11.6.1) gives 

12 /23 a 6 a 37 a \ 

- Thw){t+j+ t)- 3(1I+1) - 8 - 277 - 

Table A5 gives the exact distribution of H found, as in § 9.3, by the 
randomization method. It shows that for sample sizes 4, 4, and 3 (the 
order of these figures is irrelevant) a value of H ^ 7-1439 would 
occur in 1 per cent of trials [P = 0-01) in the long run if the null 
hypothesis were true, therefore H = 8-227 must be even rarer, i.e. 
P < 0-01. As in § 11.3 deviations from the null hypothesis in any 
direction increase the size of H. Again, as in all analyses of variance, 
this result does not give any information about differences between 
individual pairs of groups (see § 11.9). 

Example with larger samples. Table A5 only deals with k = 3 groups 
with not more than 5 observations in any group. For larger experi- 
ments it is a sufficiently good approximation to assume that H is 
distributed like chi-squared with k— 1 degrees of freedom. P can then 
be found from the chi-squared tables (see § 8.5). For example, the 
results in Table 11.4.1 have been converted to ranks, shown in paren- 
theses. In this case N — 28, n x = n^ = n 3 = n 4 = 7, R x = 71-5, 
A, = .121, R a = 152-5, and R< = 61. Applying the check N(N+ 1)/2 



Copyrighted material 



§ 1 1.6 How to deal with two or more samples 195 

= 28(28+ 1)/2 = 406 and, correctly, R x + R 3 + ^+#4 = 406. Thus, 
from (11.5.1), 

12 /71-5* 121 a 152-5 2 61 a \ 

H = 1 1 — I 1 1—3(284-1) = 11-66. 

28(28+l)\ 7 ^ 7 ^ 7 ^ 7 / * x 1 

Consulting a table of the chi-squared distribution (Fisher and Yates 
(1963, Table IV) or Pearson and Hartley (1966, Table 8)) with *-l = 3 
degrees of freedom shows that = 1 1*345 would be exceeded in 1 
per cent of experiments in the long run if the null hypothesis were 
true, so P < 0-01 for the observed value of 11-66. This is somewhat 
larger than the value of P < 0-005 found when the assumptions of the 
Gaussian analysis of variance were thought justified (Table 11.4.3), 
but it is still small enough to oast considerable doubt on the null 
hypothesis. 

As in the Gaussian analysis, the finding that all k groups are unlikely 
to be the same says nothing about which groups differ from which 
others. A method for testing all pairs of groups to answer this question 
is described in § 11.9. It is not correct to do two-sample Wilcoxon 
tests on all possible pairs. 

Correction for ties. Unless there is a very large number of ties the 
correction factor (described, for example, by Brownlee (1965, p. 256)) 
has a negligible effect. It always makes H larger, and hence P smaller 
so there is no danger that neglecting the correction factor will lead to 
rejection of the null hypothesis when it would otherwise not have been 

11.6. Randomized block designs. Gaussian analysis of variance 
for k related samples (the two-way analysis of variance) 

In §§10.1 and 6.4 it was pointed out that if the experimental units 
(e.g. patients, periods of time) can be selected in some way to form 
groups that give more homogeneous and reproducible responses than 
units selected at random, then it will be advantageous if all the treat- 
ments (k in number, say) to be compared, are compared on the units of 
such a group. The group is known as a block. The units comprising the 
block are sometimes, because of the agricultural origins of the design, 
known as plots. It must clearly contain as many (k) experimental 
units as there are treatments, or at least a multiple of k. The k treat- 
ments must be allocated strictly randomly (see § 2.3) to the k units of 
eaoh blook. Because every treatment is tested in every block the 
blocks are described as complete (of. § 11.8). This section deals with 



Copyrighted material 



1 96 The analysis of variance §11.6 

randomized complete blook experiment* when the observations are 
described by the single additive model with normally distributed error 
(11.2.2) described in § 11.2, which should be read before this section. 

The analysis in § 10.6, Student's paired t test, was an example of 
a randomized block experiment with k — 2 treatments and 2 units 
(periods of time) in each block (patient). This test will now be reform- 
ulated as an analysis of variance. 

The paired t test written as an analysis of variance 

The observations of Cushney and Peebles in Table 10.1.1 were 
analysed by a paired t test in § 10.6. At the end of § 11.4 it was shown 
how a sum of squares for differences between drugs could be obtained, 
but no account was taken of the blook arrangement actually used in the 
experiment. As before (see §§11.3 and 11.4) the calculations are suoh 
that if the null hypothesis (that all observations are from the same 
Gaussian population with variance a 3 ) were true, then the quantities 
in the mean-square column of the analysis of varianoe would all be 
independent estimates of cr a .f 

Because of the symmetry of the design it is possible to obtain an 

estimate of a 3 , on the null hypothesis that there is no real difference 

between blocks (patients), by considering deviations of block means 

from the grand mean, i.e. by analogy with (11.4.1), from 2*(#<— y..) a / 

i 

(n— 1), where k = number of treatments = number of observations 
per blook, and n = number of blocks = number of observations on 
each treatment. Unlike the one-way analysis, n must be the same for 
all treatments. N = kn is the total number of observations. From 
(11.4.5), it can be seen that the numerator of this (sum of squares 
between blocks) can most simply be calculated from the block totals, 
as the sum of squares between treatments was found from treatment 
totals in (11.4.5). From the results in Table 10.1.1 

S?Tf <3 a 

SSD between blocks = > — — (11.6.1) 

1- i 

2- 6 a 0-8 2 5-4 a 

= —+—+...+— -47-4320 

= 58-0780. 

t The expected value of the mean square* (eoe § 11.2, p. 173 and 11.4, p. 186) are 
derived by Brownlee (1966, Chapter 14). Often the mixed model in which treatment* 
are fixed effect* and block* are random effect* is appropriate (loc. oit., p. 498). 



Copyrighted material 



How to deal with two or more samples 197 



In this, the group (row, block) totals, T t , are squared and divided by 
the number of observations per total just as in (11.4.5). See §§ 2.1 and 
1 1.4 if clarification of the notation is needed. Since there n = 10 groups 
(rows, blocks) this sum of squares has n— 1 = 9 degrees of freedom. 

The values of (? a /iV(47-4320), and of the sum of squares between 
drugs (treatments, columns) (12*4820) and the total sum of squares 
(77-3680), are found exactly as in {ll.4.8)~( 1 1.4.10). The results are 
assembled in Table 11.6.1. The residual or error sum of squares is again 

Table 11.6.1 

The paired t test of $ 10.6 written as an analysis of variance. The mean 
squares are found by dividing the sum of squares by their d.f. The F ratios 
are the ratio of each mean square to the error mean square 



Source of variation 


d.f. 


Sum of 
squares 


Mean 
square 


F 


P 


Between treatment* (drugs) 


I 


12-4820 


12-4820 


16-6009 


0001-0-006 


Between block* f patients) 


9 


580780 


6-4631 


8-631 


0-001 -0-005 


Error 


9 


6-8080 


0-7664 






Total 


19 


77-3680 









found by difference (77-3680-58-0780-12-4820 = 6-8080), and so is 
its number of degrees of freedom (19—9—1 = 9). The error mean 
square will be an estimate of the variance of the observations, o*, 
after the elimination of variability due to differences between treat- 
ments (drugs) and blocks (patients), i.e. the variance the observations 
would have because of sources of experimental variability if there were 
no such differences. This is only true if there are no interactions and the 
simple additive model (11.2.2) represents the observations. The other 
mean squares would also be an estimate of a 3 if the null hypothesis 
were true, and therefore, when the null hypothesis is true the ratio of 
eaoh mean square to the error mean square should be distributed like 
the variance ratio (see § 11.3). If the size of the F ratio is so large is to 
make its observance a very rare event, the null hypothesis will be 
abandoned in favour of the idea that the numerator of the ratio has 
been inflated by real differences between treatments (or blooks). 

The variance ratio for testing differences between drugs is 16-6009 
with one d.f. in the numerator and 9 in the denominator. Reference to 
tables of the F distribution (see § 11.3) shows that F(l,9) - 13-61 



Copyrighted material 



198 The analysis of variance J 11.6 

would be exceeded in 0-5 per cent, and F(l,9) — 22*86 in 0-1 per oent 
of trials in the long run, if the null hypothesis were true. The observed 
F falls between these figures bo 0 001 < P < 0 005, just as in § 10.6. 

As pointed out in § 11.3 (and exemplified in § 11.4), F with 1 d.f. 
for the numerator is just a value of < a so vWL 9 )] = \/( 1 6-6009) 
= 4*062 ■= f(9), a value of t with 9 d.f.— exactly the same value, as 
was found in the paired t test (§ 10.6). Furthermore, the error varianoe 
from Table 11.6.1 is 0*7564 with 9 d.f. The error varianoe of the differ- 
ence between two observations should therefore, by (2.7.3), be 0-7564 
-f 0-7564 = 1-513, which is exactly the figure estimated directly in 
§ 10.6. 

If there were no real differences between blocks (patients) then 
6-453 would be an estimate of the same a 2 as the error mean square 
0-756. Referring this ratio (8-531) to Tables (see § 11.3) of the distribu- 
tion of the F ratio (with /j = 9 d.f. for the numerator and f 3 = 9 d.f. 
for the denominator) shows that the probability of an F ratio at least 
as large as 8-531 would be between 0 001 and 0 005, if the null hypo- 
thesis were true. 

This analysis, and that in § 1 1.4, show clearly why t had 18 d.f. in the 
unpaired t test (§ 9.4) but only 9 d.f. in the paired t test (§ 10.6). In the 
latter, 9 d.f. were used up by comparisons between patients (blocks). 

There is quite strong evidence (given the assumptions in § 11.2) 
that there are real differences between the treatments (drugs), as 
concluded in § 10.5. This is because an F ratio, i.e. difference between 
treatments relative to experimental error, as large as, or larger than, 
that observed (16-532) would be rare if there were no real (population) 
difference between the treatments (see §§6.1 and 11.3). Similarly, 
there is evidence of differences between blocks (patients). 

An example of a randomized block experiment with four treatments 

The following results are from an experiment designed to show 
whether the response (weal size) to intradermal injection of antibody 
followed by antigen depends on the method of preparation of the 
antibody. Four different preparations (A, B, C, and D) were tested. 
Each preparation was injected once into each of four guinea pigs. The 
preparation to be given to each of the four sites on each animal was 
decided strictly at random (see § 2.3). Guinea pigs are therefore blocks 
in the sense described above. The results are in Table 11.6.2. (This is 
actually an artificial example. The figures are taken from Table 13.1 1.1 
to illustrate the analysis.) 



Copyrighted material 



§ 1 1.6 How to deal with two or more samples 199 

Table 11.6.2 
Weal diameters using four antibody preparations in guinea pigs 

Antibody preparation (treatment) 

Guinea 



(block) 


A 


B 


C 


D 


Totals (TV) 


1 


41 


61 


62 


43 


207 


2 


48 


68 


62 


48 




3 


63 


70 


66 


53 


242 


4 


56 


72 


70 


62 


260 


Total {T.f) 


198 


271 


260 


196 


G = 925 


Mean 


40-5 


87-7 


650 


490 





The calculations follow exactly the same pattern as above. 

G 2 (925) a 

(1) 'Correction factor' (see (11.4.8)). — = = 53 476-5626. 

(2) Between antibody preparations {treatments). From (11.4.5), 

198 a 27 l a 260 2 196 a 

SSD = 1 H H 63 476-5625 = 1188-6875. 

4 4 4 4 

(3) Between guinea pigs (blocks). From (11.6.1), 

207 a 226 a 242 2 260 2 

SSD = 1 1 1 53 476-5625 = 270-6875. 

4 4 4 4 

(4) Total sum of squared deviations. From (2.6.6) (or (11.4.3)), 
SSD = 41 a +48 a 4-...+63 a 4-52 a -53 476-5625 = 1492-4375. 

(5) Error sum of squares found by difference. 

SSD = 1492-4375 -(1188-69 + 270-687) = 330625. 

There are 3 d.f. for blocks and treatments (because there are 4 blocks 
and 4 treatments) and the total number of d.f. is N — I = 15, so, by 
difference, there are 16— (3-(-3) = 9 d.f. for error. These results are 
assembled in Table 11.6.3. Comparison of each mean square with the 
error mean square gives variance ratios (both with f x — 3 and f 2 
= 9 d.f.) which, according to tables of the F distribution (see § 11.3), 
would be very rare if the null hypothesis were true. It is concluded that 
there is evidence for real differences between different antibody 



200 The analysis of variance §11.6 

preparations, and between different animals, for the same reasons as in 
the previous example. 



Table 11.6.3 



Source of variation 


d.f. 


Sum of 
squares 


Mean 

square F 


P 


Between antibody 


3 


1188-0876 


396-229 107-86 


< 0-001 


preps, (treatment*) 










Between guinea pigs 


S 


270-6876 


90-229 24-56 


< 0 001 


(blocks) 










Error 


9 


38 0626 


3-674 




Total 


16 


1402-4376 







Multiple comparisons. As in §§ 11.4 and 11.5, if it is wished to go 
further and decide which antibody preparations differ from whioh 
others the method described in § 11.0 must be used. It is not correot 
to do paired t tests on all possible pairs of treatments. 



11.7. Nonparamatric analysis of variance for randomized blocks. 
The Friedman method 

Just as in §§ 0.2, 10.3, and 11.5 the best method of analysis is to 
apply the randomization method to the original observations. The 
principles of the method have been discussed in §§ 6.3, 8.2, 0.2, 0.3, 10.3, 
10.4, and 11.5. Reasons for preferring this sort of test are disoussed in 
§ 6.2. As before, the drawback of the method is that tables cannot be 
prepared to the calculation will be tedious without a computer, though 
they are very simple ; and as before, this disadvantage can be overcome 
by using ranks in place of the original observations (the Friedman 
method). 

The randomization method 

The argument, simply an extension to more than two samples of 
that in § 10.3, is again that if the treatments were all equi-effective 
each subject would have given the same measurement whichever 
treatment had been administered (see p. 1 17 for details), so the observed 
difference between treatments would be merely a result of the way the 
random numbers came up when allocating the k treatments to the k 
units in each block (see § 11.6). There are it! possible ways (permuta- 
tions) of administering the treatments in each blook, so if there are n 



Copyrighted material 



§11.7 



How to deal with two or more samples 201 



blocks there are (&!)* ways in which the randomization could come 
out (an extension to k treatments of the 2 n ways found in § 10.3). 
If the randomization was done properly these would all be equi- 
probable, and if the F ratio for 'between treatments' is calculated for all 
of these wayB (of. § 1 1.5), the proportion of oases in which F is equal to 
or larger than the observed value is the required P, as in § 10.3. As in 
§ ll.fi it will give the same result if the sum of squared treatment 
totals, rather than F, is calculated for eaoh arrangement. 

As in previous oases an approximation to this result could be obtained 
by writing down the observations on cards. The cards for each blook 
would be separately shuffled and dealt. The first card in eaoh blook 
would be labelled treatment A, the second treatment B, and so on. 
If this process were repeated many times, an estimate of the proportion 
(P) of oases giving 'between treatments' F ratios as large as, or larger 
than the observed ratio, could be found. If this proportion was small 
it would indicate that it was improbable that a random allocation 
would give rise to the observed result, if the observation was not 
dependent on the treatment given. In other words, an allocation that 
happened to put the same treatment group subjects that were, despite 
any treatment, going to give a large observation, would be unlikely to 
turn up. 

The analysis of randomized blocks by ranks. The Friedman method 

If the observations are replaced by ranks, tables can be constructed 
to make the randomization test very simple. 

The null hypothesis is that the observations in eaoh block are all 
from the same population, and if this is rejected it will be supposed that 
the observations in any given block are not all from the same popula- 
tion, because the treatments differ in their effects. If it is wished to 
conclude that the median effects of the treatments differ it must be 
assumed that the underlying distributions (before ranking) of the 
observations is the same for observations in any given blook, though the 
form of the distribution need not be known, and it need not be the 
same for different blocks. 

As in the case of the sign test for two treatments ( § 10.2) the observa- 
tions within each blook (pair, in the case of the sign test) are ranked. 
If the observations in eaoh block are themselves not proper measure- 
ments but ranks, or arbitrary scores which must be reduced to ranks, 
the Friedman method, like the sign test, is still applicable. In fact the 
Friedman method, in the special case of k — 2 treatments, becomes 



Copyrighted material 



202 The analysis of variance § 1 1.7 

identical with the sign test. Compare this with the Wilooxon signed- 
rankfl test (§ 10.4) in which proper numerical measurements were 
necessary because differences had to be formed between members of a 
pair before the differences could be ranked. 

Suppose, as in § 11.6, that k treatments are compared, in random 
order, in n blocks. The method is to rank the k observations in each 
block from 1 to k, if the observations are not already ranks. The rank 
totals, R, (see §§2.1, 11.4, and 11.5 for notation), are then found for 
each treatment. If there were no difference between treatments these 
totals would be approximately the same for all treatments. The sum 
of the ranks (integers 1 to k) in each block should be *(*+l)/2» by 
(9.3.1), and because there are n blocks the sum of the rank sums should 
be 

= »*<*+ 1)/2. (11.7.1) 

As a measure of the discrepancy between rank sums now simply 
calculate S, the sum of squared deviations of the rank sums for each 
treatment from their mean (of. (1 1.4.1)). Prom (2.6.5), this is 

8 = 2(*-£)» = 2 (11.7.2) 

The exact distribution of this quantity, calculated according to 
the randomization method — see sections referred to at start of this 
section, is given in Table A6, for various numbers of treatments and 
blocks. For experiments with more treatments or blocks than are 
dealt with in Table A6, it is a sufficiently good approximation to 



125 

(11.7.3) 



n*(*+l) 

and find P from tables of the chi -squared distribution (e.g. Fisher and 
Yates (1963, Table IV) or Pearson and Hartley (1966, Table 8)) with 
k — 1 degrees of freedom. 

As an example, consider the results in Table 11.6.2, with k = 4 
treatments and n = 4 blocks. If the observations in each block (row) 
are ranked in ascending order from 1 to 4, the results are as shown in 
Table 11.7.1. Ties are given average ranks as in Table 9.3.1. This is an 
approximation but it is not thought to affect the result seriously if the 
number of ties is not too large. 



Copyrighted material 



§11.7 How to deal with two or more samples 203 

Applying the oheok (11.7.1), shows that should be 4.4.(4+ 1)/2 
= 40, as found in Table 11.7.1. Now oaloulate, from (11.7.2), 

(40) a 

S = ea+UP-f 13 a + 6*-^- = 66-00. 

Consulting Table A 6 shows that when k = 4 and n — 4, 8 — 64 
oorresponds to P = 0*0069. So the observed S = 66 corresponds to 
P < 0*0069. This means that if the treatments were equi -effective 
(null hypothesis) then in less than 0-69 per cent of experiments in the 



Table 11.7.1 

The observations within each block in Table 11.6.2 reduced to ranks 



Guinea pig 

(block) 


(Antibody preparation (treatment) 
A B C D 


1 


i 3 


4 2 


2 




3 U 


3 


3 * 


3 t| 


4 


2 4 


3 1 


Rank aum(R,) 


_ 6 R* « 15 


i*3 - 13 « 4 = 6 ZR t = 40 



long run would a random allocation of treatments to the units of eaoh 
block be chosen that gave differences between treatment rank sums 
(i.e. a value of 5) as large as, or larger than, that observed (8 — 66). 
The null hypothesis of equi-effectiveness is therefore rejeoted, though 
not with as much confidence as when the same results were analysed 
by the Gaussian analysis of variance in § 11.6. In Table 11.6.3, it was 
seen that if the assumptions made (aee § 11.2) were correct, P <^ 0-001, 
was much lower than found by the present method. 

If the experiment had been outside the scope of Table A6 then 
(11.7.3) would have been used giving zL* — 12.66/4.4(4+1) = 9-90. 
Consulting tables of the chi-squared distribution (see above) with 
k—l = 3 degrees of freedom shows that a value of 9-837 would be 
exceeded in 2 per cent of experiments in the long run bq P ~ 0-02. 
Not a very good approximation, in such small samples, to the exact 
value of P (just less than 0 0069) found from Table A6. 

If it were of interest to find out whether there was a difference 
between blooks, exactly the same method would be used (e.g. inter- 
change the words block and treatment throughout this section). 

Multiple comparisons. As in §§ 11.4-11.6, the conclusion that the 



Copyrighted material 



204 The analysis of variance 



§ 11.7 



treatments do not all have the same effect says nothing about which 
ones differ from which others. It would not be correct to perform sign 
tests on all possible pairs of treatment groups, in order to find out, for 
example, whether treatment B differs from treatment D. A method of 
answering this question is given in § 11.9. 

1 1 .8. The Latin square and more complex designs for experiments 

There is a vast literature describing ingenious designs for experiments 
but the analysis of almost all of these depends on the assumption of a 
normal distribution of errors and on elaborations of the models des- 
cribed in § 11.2. If the experiments are large there is, in some cases, 
some evidence that the methods will not be very sensitive to the 
assumptions. As the assumptions are rarely checkable with the amount 
of data available it may be as well to treat these more complex designs 
with caution (see comments below about use of small Latin squares). 
Certainly if they are used the advice of a critical professional statistician 
should be sought about the exact nature of the assumptions being 
made, and the interpretation of the results in the light of the mathe- 
matical model (see § 11.2). 

To emphasize the point it should be sufficient to quote Kendall and 
Stuart (1966, p. 139): 'The fact that the evidence for the validity of 
normal theory tests in randomized Latin squares is flimsy, together 
with the even greater paucity of such evidence for most other, more 
complicated, experiment designs, leads one to doubt the prevailing 
serene assumption that randomization theory will always approximate 
normal theory.' 

The Latin square 

The experiment summarized in Table 11.6.2, was actually arranged 
so that each of the four injection sites (e.g. anterior and posterior on 
each side), received every treatment once, according to the design 
shown in Table 11.8.1(a). The measurements, from Table 11.6.2, are 
given in Table 11.8.1(b). 

In the randomized block design (§11.6) each treatment appeared 
once, in random order, in each block (row). In the design shown in 
Table 11.8.1, which is called a Latin square, there is the additional 
restriction that each treatment appears once in each column so that the 
column totals are comparable. The number of columns (injection sites) 
as well as the number of blocks (rows, guinea pigs) must be the same 
as (or a multiple of) the number of treatments. If a model like (11 .2.2), 



Copyrighted material 



§ 1 1.8 How to deal with two or more samples 205 

but with another additive component characteristics of each column 
(injection site), is supposed to represent the real observations, then a 
sum of squares (see §§ 11.2, 11.3, 11.4, and 11.6) can be found from the 
observed scatter of the column totals (the corresponding mean Bquare, 
would, as usual, estimate a 2 , if the null hypothesis were true), and used 



Tablb 11.8.1 
The Latin square design 



Row 


1 


Column 
2 3 


4 


Guinea 
pig 


Injection si 
12 3 


to 
4 


Total 


1 


A 


B 


C 


D 


1 


41 


61 62 


43 


207 


2 


C 


A 


D 


B 


2 


62 


48 48 


68 


226 


3 


D 


C 


B 


A 


3 


S3 


66 70 


63 


242 


4 


B 


D 


A 


C 


4 


72 


62 66 


70 


260 


Total 


228 227 236 234 


026 



(a) (b) 



in the Gaussian analysis of variance to eliminate errors due to system- 
atic differences between columns (injection sites). The sum of squares 
is found from column totals, and number of observations per total 
(4 in this case), using (1 1.4.5) again. 
SSD between injection sites (columns) 

228 a 227 a 236 a 234 a 
= — — | — — +— - — I — 53 476-5625=14-6875 (11.8.1) 



Table 11.8.2 
Analysis of variance for the Latin square 



Hums of Mean 



Source of variation 


d.f. 


squares 


square 


F 


P 


Between antibody prepara- 












tion (treatments) 


3 


1188-6876 


396-23 


129-4 


<0O01 


Between guinea pigs 












(rowa) 


3 


270-6876 


90-23 


29-5 


<0-00l 


Between sites (columns) 


3 


14-6876 


4-89 


1-6 


>0-2 


Error 


6 


18-3760 


306 






Total 


16 


1492-4376 









Copyrighted material 



206 The analysis of variance 



§11.8 



with 3 degrees of freedom (because there are 4 columns). The sums of 
squares for differences between treatments and between guinea pigs, 
and the total sum of squares, are exactly as in § 11.6. When these 
results are filled into Table 1 1.8.2, the error sum of squares and degrees 
of freedom can be found by difference, and the rest of the table com- 
pleted as in § 11.6. Referring the variance ratio, ^(3,6) = 1-6, to 
tables (see § 11.3) shows that there is no evidence for a population 
difference between injeotion sites (P > 0-2). 

Choosing a Latin square at random 

As usual it is essential that the treatments be applied randomly, 
given the restraints of the design. This means the design, Table 11.8. 1 (a), 
actually used in the experiment had the same chance of being 
ohosen as each of the 575 other possible 4x4 Latin squares. The 
selection of a square at random is not as straightforward as it might 
appear at first sight and is frequently not done correctly. Fisher and 
Yates (1963) give a catalogue of Latin squares (Table 25), and instruc- 
tions for choosing a square at random (introduction, p. 24). 

Are Latin squares reliable ? 

The answer is that if the assumptions of the mathematical model 
are true, then they are an excellent way of eliminating experimental 
errors to two sorts (e.g., guinea pigs and injection sites) from the 
comparisons between treatments which is of primary interest. How- 
ever, as usual, it is very rare that there is any information about 
whether the model is correct or not. In the case of the t test the Gaussian 
approach could be justified because it has been shown to be a good 
approximation to the randomization method (§ 9.2) if the samples are 
large enough. However, there is much less information on the sensitivity 
of Latin squares (and more complex designs) to departures from the 
assumptions. In the case of the 4x4 Latin square the randomization 
method does not, in general, give results in agreement with the Gaussian 
analysis so one is totally reliant on the assumptions of the latter being 
sufficiently nearly true. It is thus doubtful whether Latin squares as 
small as 4 x 4 should be used in most circumstances, though the larger 
squares are thought to be safer (see Kempthorne 1952). 

Incomplete block designs 

In § 11.6, the randomized block method was described for eliminating 
errors due to differences between blocks, from comparisons between 



Copyrighted material 



§ H.8 



How to deal with two or more samples 207 



treatments. Sometimes it may not be possible to test every treatment 
on every block as when, for example, four treatments are to be com- 
pared, but each patient = block is only available for long enough 
to receive two. It is sometimes still possible to eliminate differences 
between blocks even when each block does not contain every treatment. 
Catalogues of designs are given by Fisher and Yates (1963, pp. 25, 
91-3) and Cochran and Cox (1957). 

A nonparametric analysis of incomplete block experiments has been 
given by Durbin (1951). 

Examples of the use of balanced incomplete block designs for bio- 
logical assays (see §13.1) have been given by, for example, Bliss 
(1947) and Finney (1964). General formulas for the simplest analysis 
of biological assays based on balanced incomplete blocks are given by 
Colquhoun (1963). 

11.9. Data snooping. The problem of multiple comparisons 

In all forms of analysis of variance discussed, it has been seen that 
all that can be inferred is whether or not it is plausible that all of the 
k treatments (or blocks, etc.) are really identical. If there are more 
than two treatments the question of which ones differ from which 
others is not answered. The obvious answer is never to bother with the 
analysis of variance but to test all possible pairs of treatments by the 
two sample methods of Chapters 9 and 10. However, it must be re- 
membered that it is expected that the null hypothesis will sometimes 
be rejected even when it is true (see § 6.1), so if a large number of 
tests are done some will give the wrong answer. In particular, if several 
treatments are tested and the results inspected for possible differences 
between means, and the likely looking pairs tested ('data selection', 
or as statisticians often call it 'data snooping'), the P value obtained 
will be quite wrong. 

This is made obvious by considering an extreme example. Imagine 
that sets of, say, 100 samples are drawn repeatedly from the same 
population (i.e. null hypothesis true), and each time the sample out of 
the set of 100 with largest mean is tested, using a two-sample test, 
against the sample with the smallest mean. With 100 samples the 
largest mean is likely to be so different from the smallest that the 
null hypothesis (that they come from the same population) would be 
rejected (wrongly) almost every time the experiment was repeated, 
not only in 1 or 5 per cent (according to what value of P is chosen as 
low enough to reject the null hypothesis) of repeated experiments as it 

*3 



Copyrighted material 



208 The analysis of variance 



§ 11.9 



should be (see § 6.1). If the particular treatment* to be compared are 
not chosen before the results are seen, allowance must be made for data 
snooping. There are various approaches. 

One way is to compare all possible pairs of treatments. This is 
probably the most generally useful, and methods of doing it for both 
nonparametric and Gaussian analysis of variance are described below. 

Another case arises when one of the treatments is a control and it is 
required to test the difference between each of the other treatment 
means and the control mean. In the Gaussian analysis of variance 
this is done by finding confidence intervals for the jth difference 
as difference ± ds\S(lln c + 1 jn f ) where n c is the number of control 
observations, n f the number of observations on the jth treatment, * is 
the square root of the error mean square from the analysis of variance, 
and d is a quantity (analogous to Student's t) tabulated by Dunnett 
(1964). Tables for doing the same sort of thing in the nonparametric 
analyses of variance are given by Wilcoxon and Wilcox (1964). 

A third possibility is to ask whether the largest of the ti-eatment 
means differs from the others. Nonparametric tables are given by 
McDonald and Thompson (1967). 

The critical range method for testing all possible pairs in the Kruskal- 
Wallis nonparametric one way analysis of variance (§ 11.5) 

Using this method, which is due to Wilcoxon, all possible pairs of 
treatments can be compared validly using Table A7, though the table 
only deals with equal sample sizes. The procedure is very simple. 
Just calculate the difference between the rank sums for any pair of 
groups that is of interest. If this difference is equal to (or larger than) 
the critical range given in Table A7, the P value is equal to (or less 
than) the value given in the table. For small samples exact probabilities 
are given in the table (they cannot be made exactly the same as the 
approximate P values at the head of the column because of the dis- 
continuous nature of the problem, as in § 7.3 for example). For larger 
samples use the approximate P value at the head of the column. 

The first example of the Kruskal-Wallis analysis given in § 11.5 
cannot be used to illustrate the method because it has unequal groups. 
The second example in § 11.5, based on the (parenthesized) ranks in 
Table 1 1.4.1, will be used. In this example there were k — 4 treatments 
and n = 7 replicates, and evidence was found in § 11.6 that the treat- 
ments were not e qui -effective. Consulting Table A 7 shows that a 
difference between two rank sums (selected from four) of 79-1 or larger 



Copyrighted material 



§11.9 How to deal with two or more samples 209 

would occur in about 6 per cent of random allocations of the 28 subjects 
to 4 groups (i.e. in about fi per cent of repeated experiments if the 
null hypothesis were true), that is to say p ~ 0*05 for a difference of 
79-1. Similarly P ~ 0 01 for a difference of 95-8. 

The simplest way of writing down the differences between all six 
possible pairs of rank sums is to construct a table of differences, with 
the rank sums from § 11.6 (or Table 11.4.1), as in Table 11.9.1. The 
treatments have been arranged in ascending order so the largest 
differences occur together in the bottom left-hand, corner of the table. 



Table 11.9.1 



Treatment 




4 


1 


2 


a 




rank sum 


61 


71-6 


121 


152-5 


4 


61 




1 


71-6 










2 


121 


600 


40-5 






3 


162-5 


91-6* 


810* 


31-5 

















10.5 



The differences marked with an asterisk in Table 11.9.1 are larger than 
79- 1 but less than 95-8. So P is somewhere between 0*01 and 0*06 for 
these differences suggesting (see § 6.1) that there is a real difference 
between treatments 3 and 1, and between treatments 3 and 4. All 
other differences are less than 79-1 so there is little evidence (P > 0*06) 
for any other treatment differences. 

The critical range method for testing all possible pairs in the Friedman 
nonparametric two way analysis of variance (§ 11.7) 

This method, also due to Wilcoxon, allows valid comparison of any 
pair of treatments in the Friedman method (§ 11.7), using Table A8 
in much the same way as just described for the one way analysis. 

The results in Table 11.7.1 will be used to illustrate the method. 
There are k — 4 treatments and n = 4 blocks (replicates), so reference 
to Table A8 shows that a difference (between any two treatment rank 
sums selected from the four) as large as, or larger, than 11 would be 
expected in only 0-5 per cent of repeated experiments if the null 
hypothesis (see § 11.7) were true, i.e. if ranks were allocated randomly 
within blocks. Similarly a difference of 10 would correspond to P 
= 0026. 

A table of all possible pair differences between the rank sums from 
Table 11.7.1 can be constructed as above in Table 11.9.2. 



210 The analyst* of variance 



$ 11.9 



AO the six differences are less than 10, i.e. none reaches the P = 0-026 
level of significance. Despite the evidence (in § 11.7) thai the four 
treatments are not equi -effective, it is not, in this case, possible to 
detect with any certainty which treatments differ from which others. 
This is not so surprising looking at the ranks in Table 11.7.1, but 
looking at the original figures in Table 11.6.2 suggests strongly that 



Tabli 11.9.2 



Treatment 



I 



A 
« 



D 



C 
13 



B 
16 



A 

D 

C 



6 
0 

13 
16 



0 

7 7 
9 9 



treatments B and C give larger responses than A and D. In fact, if the 
assumptions of Gaussian methods (see § 11.2) are though justifiable, 
the Scheffe method could be used, and it is shown below that it gives 
just this result. The reason for the apparent lack of sensitivity of the 
rank method with the small samples is similar to that discussed in 
} 10.5 for the two-sample case. 

Schefffa method for multiple comparisons in the Gaussian analysis of 
variance 

The Gaussian analogue of the critical range methods just described 
is the method of Tukey (see, for example, Mood and Graybill (1963, 
pp. 267-71)), but ScheffS's method is more general. 

Suppose there are k treatments. Define, in general, a contrast (see 
examples below, and also § 13.9) between the k means as 

L=Za f g, (11.9.1) 

where Ea, = 0. The values of a, are constants, some of which may be 
zero. When the y i are the means of independent samples the estimated 
variance of this contrast follows from (2.7.10) and is 

var(L)= 2a?^ = ^ (H.9.2) 

where s 2 is the variance of y (the error mean square from the analysis 
of variance) and n, is the number of observations in the jth treatment 



Copyrighted material 



§11.9 



How to deal with two or more samples 211 



mean, # y . The method is to construct confidence limit* (see Chapter 7) 
for the population (true) mean value of L as 

L ± SV[var(L)] (11.9.3) 

where S — V[(*— ^ F ift the value of tne variance ratio (see 
§11.3) for the required probability. For the numerator F has (*— 1) 
degrees of freedom, and for the denominator the number of degrees of 
freedom associated with s*. If the confidence limits include any hypo- 
thetical value of L the observations cannot be considered incompatible 
with this value, as explained in § 9.4. 

Thru numerical examples 

Example 1. Suppose that it were decided to test whether the largest 
mean in Table 11.4.1 (ft = 33-67) really differs from the smallest 
(ft = 14-29), this pair being chosen after the results were known. If 
in (11.9.1) we take a x = 0, % = 0, Og = +1 and a 4 = —1 then L 
= p 3 — ^ = 19-28, the difference between means. From Table 11.4.3 
it is seen that s 2 = 10115 with 24 degrees of freedom. Thus, by (U.9.2), 
var(X) = 101-15(0»/7+0»/7 + l"/7+-l>/7) = 28-90. There are k - 4 
treatments so to find the 99 per cent confidence limits, the variance 
ratio for P - 0 01 (= 1-0-99) with 3 and 24 degrees of freedom is 
required. From the tables (see § 11.3) this is found to be 4-72. Thus 
S = vl(4-l).4-72] = 3-763. The P = 0-99 confidence limits are 
19-28 ± 3-763V(28-90) = 19-28 ± 20-23, i.e. -0-95 to +39-51. The 
limits include 0, so the difference between the two means cannot be 
considered to differ from 0 at the P = 0-99 level of confidence. In 
other words a significance test (see § 6.1) for the difference between 
the largest and smallest means would give P > 0-01 (compare § 9.4). 
And, because S y'T var(L)] is the same for any pair of means, the same 
can be said of any pair of means differing by less than 20-23 (see 
Example (3) also). 

Now try the 97*5 per cent limits. From the Biometrika tables (see 
§ 11.3) the value of F(Z, 24) for P = 0 025 is 3-72, so £\/[var(Z,)] 
= \/(3 x 3-72 x 28-90) =» 17-96. This is less than the observed difference, 
19-28, so the result of a significance test would be that P is between 
0-025 and 0-01, suggesting, though not with great confidence, a real 

J* £P 

\JLaXJ. vi waX^K7 • 

Example 2. As another example, suppose that it were wished to test 
the null hypothesis that mean of the two more effective treatments 
(2 and 3) is equal to the mean of the other two treatments in Table 



Copyrighted material 



212 



The analysis of variance 



f 11.9 



11.4,1. To do thin, take a, = -l,a 2 =+l,a 3 =+l and a 4 = — 1 so 
L = (# 2 +&,)-(pi+p«). The true (population) value of this will be 
zero if the hypothesis to be tested is true. The sample value is L 
= -15-67 + 26-86 + 33-57 -14-29 = 30-57. From (11.9.2) var(L) = 
10115 (-l a /7 + l 2 /7+l a /7+-l 3 /7) = 67-80. 5 = 3-763 exactly as 
above, so the 99 per cent (P = 0-99) confidence limits for the population 
value of L are 30-57 ± 3-763^(67 80) = 30-57 ± 28-61, i.e. +1-96 
to +59-18. The limits do not include zero so the null hypothesis that 
the true (population) value of L is zero would be rejected if P < 0-01 
were considered sufficiently small (see {6.1). The same could be said 
of any difference (between the sum of any two means and the sum of 
the other two), that exceeded 28-61. 

Example 3. The method can be used, at least as an approximation, 
for randomized block experiments also. For the results in Table 11.6.2, 
s 2 = 3-674 with 9 d.f. (from Table 11.6.3). To test y 2 against take 
a 1 — —1, Oa = +1, 03 = 0, a 4 = 0, as in Example (1). There are 
n = 4 replicates, so var(L) = 3-674 (-l^+l^+O^+O 8 ^) = 1-837, 
from (11.9.2). And this value will be the same for the difference 
between any two means. There are k = 4 treatments so values of 
^(3,9) are required. From the Biometrika tables (see §11.3) the 
P = 0-25 value is 1-63 and the P = 0-001 value is 13 90. Thus, S = 
V(3x 1-63) = 2-211 and 5-v/[var(L)] = 2-211 V( 1837) = 2-996 for 
P = 0-25. And for P = 0-001, 8 = y/(3x 13-90) = 6-457, so Sylvar 
(L)] = 8-749. The differences between the six possible pairs of means 
from Table 11.6.2 are, tabulating as above, shown in Table 11.9.3. 



Table 11.9.3 



Treatment 




D 


A 


C 


B 




Mean 


49-0 


49-5 


65-0 


67-7 


D 


490 










A 


405 


0 6 








C 


66-0 


160* 


16-6* 






B 


67-7 


18-7* 


18-2* 


2-7 





The four differences marked by an asterisk in Table 11.9.3 are greater 
than 8-749 so the null hypothesis that the true values of these differ- 
ences are zero can be 'rejected' at P < 0 001 (see § 6.1). The other 
differences are less than 2-996 so there is no evidence (P > 0-25) that 
these differences are due to anything but experimental error. It is 
conoluded that treatments B and C both give larger responses than 



Copyrighted material 



§ 1 1.9 How to deal with two or more samples 213 

treatments A and D, but that no difference can be detected between 
A and D, or between B and C. Compare this result with rank analysis 
of the same observations (§§11.7 and 11.9, above). Remember that a 
normal (Gaussian) distribution has been assumed throughout these 
calculations despite the fact that no evidence was presented, in any of 
the examples, to suggest that this assumption was justified. 



Copyrighted material 



12. Fitting curves. The relationship 
between two variables 



12.1 . Nature of the problem 

In all the examples discussed so far, measurements of only one variable 
have been involved (e.g. blood sugar level or change in duration of 
sleep). However, experiments are often concerned with the relationship 
between two (or more) variables ; for example, dose of drug and response, 
concentration and optical density, time and extent of chemical reaction, 
or school and university examination results. The last of these examples 
is rather different from the others and suggests that two sorts of 
situation occur. 

(a) One variable can be measured accurately and its value chosen by 
the experimenter, for example the time when a measurement is taken, 
or the dose of a drug. This sort of variable is called an independent 
variable (notice that independent in this context has a different meaning 
from that encountered in §§2.4 and 2.7). The other variable, called the 
dependent variable, is subject to experimental error, and its value 
depends on the value chosen for the independent variable. For example, 
response is a dependent variable which is related to dose, the inde- 
pendent variable (as long as dose can be measured with negligible 
error). 

(b) Quite often the value of neither variable can be chosen by the 
experimenter, or measured without error. For example ability before 
and after university (as measured by school and university exam 
results) are both measured inaccurately. 

In both of these cases the first thing usually done is to plot the 
results and draw some sort of line through them. 

Case (a) is described (for historical reasons, now irrelevant) as a 
regression problem. The line fitted to the points is called a regression 
line, the formula for calculating it being the regression equation. This 
sort of problem is dealt with in §§ 12.1-12.8. 

The second type of problem, case (b), is a correlation problem and 
the graph of the results is often called a scatter diagram (see §§1.2 and 
12.9). 



Copyrighted material 



The relationship between two variables 215 



The expression 'fitting a curve to the observed points' means the 
process of finding estimates of the parameters of the fitted equation 
that result in a calculated curve which fits the observations 'best' (in a 
sense to be specified below and in § 12.8). For example if a straight line 
is to be fitted the 'best* estimates of its arbitrary parameters (the slope 
and intercept) are wanted. The method of fitting the straight line is 
discussed in detail in §§ 12.2 to 12.6 because it is the simplest problem. 
But very often, especially if one has an idea of the physical mechanism 
underlying the observations, the observations will not be represented 
by a straight line and a more complex sort of curve must be fitted. 
This situation is discussed in §§12.7 and 12.8. Often some way of 
transforming the observations to a straight line is adopted, but this 
may have a considerable hazards as explained in §§ 12.2 and 12.8. 

It is, however, usually (see § 13.14) not justified to fit anything but a 
straight line if the deviations from the fitted line are no greater than 
could reasonably be expected by chance (i.e. than could be expected if 
the true line were straight). In general it is usually reasonable to use 
the simplest relationship consistent with the observations. By simplest 
is meant the equation containing the smallest number of arbitrary 
parameters (e.g. slope), the values of which have to be estimated from 
the observations. This is an application of 'Occam's razor' (one version 
of which states 'It is vain to do with more what can be done with fewer' : 
William of Occam, early fourteenth century). The reason for doing 
this is not that the simplest relationship is likely to be the true one, 
but rather because the simplest relationship is the easiest to refute 
should it be wrong. (The opposite would, of course, be true if the 
parameters were not arbitrary, and estimated from the observations, 
but were specified numerically by the theory.) 

The role of statistical methods 
Statistical methods are useful 

(1) for finding the best estimates of the parameters (see §§ 12.2, 12.7, 
and 12.8) of the chosen regression equation, and confidence limits 
(Chapter 7 and §§ 12.4-12.6) for these estimates, 

(2) to test whether the deviations of the observed points from the 
calculated points (the latter being obtained using the best estimates 
of the parameters) are greater than could reasonably be expected by 
chance, i.e. to test whether the type of curve chosen fits the observations 
adequately. It is important to remember (see § 6.1) that if observations 
do not deviate 'significantly' from, say, a straight line, this does not 



Copyrighted material 



216 Fitting curves 



§12.1 



mean that the true relationship can be inferred to be straight (see 
f 13.14 for an example of practical importance). 

The beat fitting curve (see § 12.8) is usually found using the method of 
least squares. This means that the curve is chosen that minimizes the 
'badness of fit' as measured by the sum of the squares of the deviations 
of the observations (y) from the calculated values ( Y) on the fitted 
curve. In other words, the values of the parameters in the regression 
equation must be adjusted so as to minimize this sum of squares. In 
the case of the straight line and some simple curves such as the parabola 
the best estimates of the parameters can be calculated directly (see 
§§ 12.2 and 12.7). The principle of least squares can be applied to 
any sort of ourve but for non-linear problems (see § 12.8; but note that 
fitting some sorts of curve is a linear problem in the statistical sense, 
as explained in § 12.7) it may not have the optimum properties that it 
can be shown to have for linear problems (those of providing unbiased 
estimates of the parameters with minimum variance, see § 12.8, and 
Kendall and Stuart (1961, p. 7fi)). For linear problems these optimum 
properties are, surprisingly, not dependent on any assumption about 
the distribution of the observations, but the construction of confidence 
limits and all the analyses of variance depend on the assumption that 
the errors of the observations follow the Gaussian (normal) distribution, 
so all regression methods must (unfortunately) be classed as parametric 
methods (see § 6.2). Tests for normality are discussed in § 4.6. 

12.2. The straight line. Estimates of the parameters 

It is assumed throughout this discussion of linear regression that the 
independent variable, x (e.g. time, concentration of drug, see § 12.1) 
can be measured reproducibly and its value fixed by the experimenter. 
The experimental errors are assumed to be in the observations on the 
dependent variable, y (e.g. response, see § 12.1). Suppose that several 
{k, say) values of the independent variable , x lf x^, . . ., x k , are chosen 
and that for each observations are made on the dependent variable, 
flfn y a » • • •» Vn (there being N observations altogether ; N may be bigger 
than k if several observations are made at each value of x, as in § 12.6). 

In order to find the 'best' straight line by the method of least squares 
(see §§ 12.1, 12.7, and 12.8) it is necessary to find the line that will 
minimize the badness of fit as measured by the sum of squares of 
deviations of the dependent variable from the line, as shown in Fig. 
12.2.1. 



Copyrighted material 



§12.2 The relationship between ttvo variables 217 

Thus the beat fitting straight line is the one that minimizes the sum of 
squared deviations 

8- , 2$-V»i-TJF (12.2.1) 

y-i 

where y f is the observed, and Y f the calculated, value of the dependent 
variable corresponding to x f . The resulting line is called the regression 
line of y on x. If the deviations of points from the fitted line were not 
measured vertically as in (12.2.1), but, say, horizontally, the least 
squares line would be different from that found in the way just described 




i 

i 

1 ■ ■ ■ 



*\ *t *> *i 

Fio. 12.2.1. The dependent variable y, plotted against the independent 
variable x. Definition of y s . x,, Y s and d s for discussion of curve fitting. 



(it would be called the regression line of j on y), but this would not be 
the correct approach when the experimental errors are supposed to 
affect y only. 

The general equation for a straight line can be written Y = a'+bx 
where a' is the intercept (i.e. the value of Y when x = 0). It will be 
more convenient (for reasons explained in § 12.7) to write this in a 
slightly different form, viz. 

Y = a+b(x-x) (12.2.2) 

where b is the slope, and a is the value of Y when x = x (so that 
a—bx, which is a constant, is the same as a'). The left-hand side is 



218 Fitting curves 



§ 12.2 



written as capital Y to emphasize that the evaluation of the equation 
gives the calculated value of the dependent variable, which will in 
general, differ from the observations (y) at the same value of x, unless 
the observation happens to lie exactly on the calculated line. 

The true (population) regression equation, assuming the line to be 
really straight, can be written, for any specified value of x, 

fi = population value of y = <x+0(s— x) (12.2.3) 

where a and fi are the true parameters, of which the statistics a and 6 
are estimates made from a sample of observations. Because the in- 
dependent variable, x, is assumed to be measured with negligible error 
there is no distinction between the observed and true values of x. 

The problem is now to find the least squares estimates of a and fi, 
from the observations. This will be done algebraically for the moment. 
In § 12.7 the geometrical meaning of the algebra is explained. First 
substitute the calculated value of Y at the jth value of x, whioh, from 
(12.2.2), is 7, = a+b(x,-x), into (12.2.1) giving 

S ='l[y/-*-*>(*i-*)] a - (12.2.4) 
Squaring the term in brackets gives 

= 2[^+a a +^-*) a -2ay i -2y^(z / -i)+2o6(x / -i)] 
and therefore, using (2.1.5), (2.1.6), and (2.1.8), 

N If N /i H 

S= Jyf+Na'+PZ (x f -xf -2a J y, - 26 J yfa -i ) + 2ab J {x, -x ) . 
i-i /-i /-i /-i /-i 

(12.2.5) 

The last term in this equation is zero because, by (2.6.1), 2(x— x) = 0. 
The object is to find, for the particular values of x used and the particular 
values of y observed, the values of a and 6 which make 8 as small as 
possible. For a particular set of results we are, for the moment regarding 
x and y values as fixed and a and b as variables. The usual procedure in 
calculus for finding a minimum is to differentiate! and equate the 
result to zero as illustrated (for a) in Fig. 12.2.2 (see Thompson (1965, 
p. 78 et seq.)) A fuller explanation of this process is given in § 12.7. 

f Because there are two variables, a and b, partial differential coefficient* (with 
curly 0) are used. Thie makes the differentiation of (12.2.6) even simpler because it 
means that when differentiating with respect to o, 6 is treated as a oonstant (and vice 
vena). See § 12.7. 



Copyrighted material 



§ 12.2 



The relationship between two variables 



It is shown below (see (12.2.10)) how the least squares estimates can 
derived without using calculus at all. Thus, to find the least squat 
value of a, differentiate (12.2.5) treating 6 as a constant 

-=2^-22^ = 0 
da j.\ 



therefore Na = Ly y 

2y 



so 



a = s = y 



N 



(12.2 



80 



70 



m 

> 50 
v 

1 40- 



S"30- 

f 20 

SB 

10 



0 




Slope ^0 

at minimum 



8 9 

I 

a = 800 



10 11 12 
Possible values of a 



Fio. 12.2.2. The sum of squared deviations (8) plotted against varic 
values of a using eqn. (12.2.6). The data (y and x values) are those in Table 12. 
and b was held constant at 3 00 (cf. Pig. 12.7.3). The slope of the curve, dSf 
is zero at the minimum. The graph is discussed in detail in § 12.7. 

Similarly, to find the least squares estimate of b, differentiate (12.2 
with respect to 6, treating a as a constant, 

— = 2b I (x / -x) a -2 2 V f (x f -x) = 0 

therefore 2bL(x f —x) 2 = 2Z,y f (x j —x) 

Ey,(x y -f) 



or 



b = 
b = 



E(s,-x) a 

Sfo-y){*/-*)i 



(12.2 
(12.2 



220 Fitting curves 



§ 12.2 



Although it is not immediately obvious, the numerators of these two 
expressions for b are identical, as shown by (12.2.9) below. 

Using (2.6.2) and (2.6.6) shows that the estimated slope, (12.2.8), can 
be written 6 = oov(ar, y)/var(ar). It was shown in § 2.6 that £(y— §) 
(x—x), and hence the co variance, measure the extent to which y tends 
to increase when x is inoreased. 

Proof that H(y —$)(z—x) can be written Ey(x —£) 

S (*,-$>(*,-*) - Uvr*,-v*-9x,+y*} 

= UVi(*) -*)-t?(x,-x)] 
= Zy,(x t -£)-gX(x t -2) 

= (12.2.9) 
/-l 

because the last term in the penultimate equation is, by (2.6.1), aero. 

How to find the least squares estimates without using calculus 

The argument is exactly analogous to that already used for the arithmetic 
mean in § 2.6 (p. 27). The sum of squares to be minimized, (12.2.4), can be 
written 

S m !&,-«-»(#;-«)]* = -i)J"+JV(a -by. L(x ; -f) a , 

(12.2.10) 

where a and b denote possible estimates of a and 0, and 6 denotes, as in 1 12.7, 
the estimate of 0 given by (12.2.7). This expression is analogous to (2.5.6). 
It Is easy to see that the values of a and b that minimize this are a = y and 
b = b (the same estimates as found above), because in this case the last two 
terms will be zero, their minimum possible value. It can quite easily be shown 

that (12.2.10) is an algebraic identity by inserting b from (12.2.7) and expanding 
the right side, in the same sort of way ao shown in detail for (2.5.6). 

Assumptions made, ni^Rthe least squares fitting and analysis of straight 
lines 

(1) The standard deviation of y was assumed to be a constant. That 
is to say, that the observations have the same scatter at all points 
along the ourve so that equal weight can be attached to all observations 
(as has been done in the above derivations, cf. (2.5.1) and (2.5.2)). 
When this condition is fulfilled the observations are described as 
homoscedastic. Quite often this condition is not fulfilled as illustrated in 
Fig. 12.2.3. For instance, it is quite commonly found in practice that 
there is a tendency for the smaller observations to have less scatter, 
in a way that the relative scatter (e.g. the coefficient of variation, 



Copyrighted material 



§ 12.2 



The relationship between two variables 221 



(2.6.4)) is more nearly constant than the absolute scatter (e.g. the 
standard deviation). If this is the case the observations (which are said 
to show heteroscedasticity) should not be given equal weight, and this 
makes the calculations more complicated (cf. Chapter 14). 

(2) The population {true) relation between y and x has been assumed 
to be a straight line. In § 12.6 it will be shown how it can be judged 
whether deviations from linearity can reasonably be ascribed to 
experimental error. 




(3) The independent variable has been assumed to be measured 
with negligible error. For a discussion of what to do when it is not see, 
for example, Brownlee (1965, p. 391). 

(4) The analyses to be described will all assume that the errors in 
the observations (y, the dependent variable) at each of the selected x 
values follow Gaussian (normal) distributions (see §§ 4.2, 4.6 and 12.1). 

The use of transformations 

This discussion is closely related to that in § 11.2, in which a method 
of choosing a transformation to equalize variances was described. 
Transformation (e.g. logarithm, square root, reciprocal) may be used 
to make results conform with the above assumptions. For example, 
if the observations are described by an exponential relationship, 
Y = Y 0 e~ kx , then taking natural logarithms gives log Y = log 7 0 — lex. 
The regression of log y on x should therefore be a straight line with 
intercept — log Y 0 and slope = —k. An example is worked out in 
§ 12.6. Notice, however, that if y were homoscedastic and normally 
distributed then log y would be neither, so it may not be possible to 



Copyrighted material 



222 Fitting curves 



§ 12.2 



satisfy all the assumptions simultaneously (see §§ 11.2, 12.8 and 
Bartlett (1947)). Tests for normality are discussed in § 4.6. 

It is important to distinguish between the effects of transformations 
of the dependent variable, y, on one hand, and on the independent 
variable, z, on the other. Transformations of x are often used to make 
a line straight (e.g. response, y, is often plotted against the log of the 
dose, x, in pharmacology). This merely alters the spacing at which 
points are plotted along the abscissa in Fig. 12.3, but cannot have any 
effect on the homoscedasticity or distribution of errors of the observa- 
tions, y. Transformations of y, on the other hand, affect these as well as 
linearity. 

12.3. Measurement of the error in linear regression 

Consider the straight line fitted to the results in Fig. 12.3.1. As 
before, y stands for the observation at a particular value of x, and Y 



y 



y* 






r? 












■ 














(*-*'«) 




© 










-y 




(>W) 


■ — i 


© 










• 




!* 












J 4 


X 



Fio. 12.3.1. Definition of terms used in curve fitting. The values of the 
dependent variable are plotted on the ordinate and the independent variable {z) 
on the abscissa (see §§12.1 and 12.2). The five observed values (Q). V\ to y„, 
have been plotted against the corresponding x values, x x to x B , and a straight 
line fitted to them. The nature of the terms (y — Y), { Y —y), and iy — y) occurring 
in eqns. (12.3.2) and (12.3.3), is illustrated for the fourth x value. 

for the predicted value of the dependent variable (i.e. that calculated 
from the estimated line) at a particular x. The equation for the esti- 
mated line, Y = o+ b{x— x), can be written, using (12.2.6), 

Y = g+b{z-z), (12.3.1) 



Copyrighted material 



§ 12.3 



The relationship between two variables 223 



from which it can be see that the line must go through the point 
x) because Y = y when z — x {i.e. when x—x = 0). 
This seotion is concerned only with errors in y, because x has been 
assumed to be measured without error (§§ 12.1, 12.2). The total devia- 
tion of the observed point from the mean, in Fig. 12.3.1, can be divided 
into two parts: (y — Y) = deviation of observed value from the line, 
and ( Y— y) = deviation of predicted value on the line from the mean of 
all observations. This can be written 

(y-y) = iy-Y) + (Y-§) (12.3.2) 

toUl deviation deviation from part of the total 

■tralfht Hoe deviation accounted 

for by the linear 
relation between 
y and x 

It is now possible to use the analysis of variance approach. The total 
sum of squared deviations (SSD) of each observation from the grand 
mean of all observations is £(y,— y) a , and this total SSD oan be divided 
into two components (compare (12.3.2)) 

Z(y,-y) 2 - Z(y,-r,) a +L(y,-j?) a (12.3.3) 

in which the first term on the right-hand aide measures the extent 
to which the observations deviate from the line and is called the SSD 
for deviations from linearity. It is this that is minimized in finding least 
squares estimates, see § 12.2. The second term on the right-hand side 
measures the amount of the total variability of y from y that is accounted 
for by the linear relation between y and x, and is called the SSD due to 
linear regression. That (12.3.3) is merely an algebraic identity following 
from (12.3.2) will now be proved. 

Digression to prove (12.3.3), and to obtain a working formula for the sum 
of squares due to linear regression [3" 

(1) To thow that (12.3.3) follows from (1202). The summations axe, as before, 
over all N observations of y (there may be more than one y at each x value). 
From (12#2), 

total SSD = X(v-g) 2 = £[(y-r) + (r-$)p 

= Z((y-r) a +2(y-r)(r-$)+(r-0) a ] 

= L(y-F) a +2L(y-r)(F-y)+i:<r-y-) a 
= L(y-r) 2 -hS(F-tf) a Q.B.D. 

deviation! due to linear 

from regreealon 
linearity 

16 




224 Fitting curves § 12.3 

The central term in the penultimate equation is zero because 



2L(y-Y)(Y-y) 
= 2£[y-0 -b{x -S)l9+b(x -£)-$) (from (12.3.1)) 
= 2liyb(z-£)-Hb(z-x)-b 3 {x-x) 2 ] 
= 2bZy(x -£) ~26>£(a; -i ) a (from (2.6. 1 ) ) 
= 2bljy(x-x) -2bY.y(x -£) = 0. Q.E.D. 

(from (12.2.7)) 

(2) A working formula for the turn of •quart* due to linear regression 

As usual, it is inconvenient to calculate the Individual deviations ( Y —$), and 

a more convenient working formula is used. As before, the summations are over 

all N observations. 

E(F-y) a - Hy+b(z -£•)-$? (from (12.3.1)) 

- nb(x-£)? = b^x-fp (from (2.1.6)). 

Substituting (12.2.8) for the slope, b, gives the alternative forms: 



8SD due to linear regression - b*Hz-2f op \* x _ £)3 • (12.3.4) 



12.4. Confidence limits for a fitted line. The important 

distinction between the variance of y and the variance of Y 

It was stated in § 12.2 that the method of fitting a straight line 
there desoribed involves the assumption that the scatter of the observa- 
tions does not depend on their size (see Fig. 12.2.3), i.e. that their 
population (true) variance, <P\y\ t is a constant, independent of the 
value of x (and of y) The estimated value of a 2 from a sample is s 2 ^] 
or var[y], the error mean square from the analysis of variance table 
(see §§ 11.4, 12.6, and 12.6). The width of the confidence interval 
for the population value of an observation is therefore the same 
(±k[y]» ( 7 - 4 - 2 )) whatever the size of the observation. 

In practice a straight line is usually fitted for one of the following 
reasons. 

(a) To estimate the slope or intercept and their confidence limits 
(see §§ 7.2, 7.9, 12.5, and 12.6). 

(b) To predict values of y for a given x. For example, it may be 
required to predict from the fitted line what response (Y) would be 
produced by a particular dose (*), or what optical density (7) a 
solution of a particular concentration (x) will have. The error of suoh 
a prediction is discussed in this section. There are two forms of the 
problem, as in § 7.4. 

(c) To predict the value of x required to produce a given (observed 
or hypothetical) value of y. For example, the prediction of the dose 
(x) needed for a given response, or of the concentration (x) of a solution 



Copyrighted material 



§ 12.4 The relationship between two variables 226 

of a particular optical density. This sort of problem is probably the 
most important in practice but its solution is rather more complicated 
than for (a) and (b). Its solution will be given in § 13.14. 

In case (b) confidence limits are required for a value of Y calculated 
from the fitted line, rather than for an observed value, y (see § 12.2) ; and 
to find these limits an estimate of the variance of Y will now be found. 
For the meaning and interpretation of confidence limits see }| 7.2 and 
7.9. 

Equation (12.2.2) for the fitted line ia Y — a-\~b{x— x) where a = j? 
and b « Zy{x—x )/£(*— £f (from (12.2.6) and (12.2.7)). Because the 
independent variable (x) is assumed to be measured without error 
(§ 12.2), terms involving only x can be treated, for the purposes of 
assessing error, as constants. Because, as shown below, a and b, and 
hence Y, are linear functions of the observations, it follows that if the 
observations are normally distributed then a, b, and Y will be normally 
distributed. Imagine the experiment being repeated many times on 
repeated random samples from the same population, using the same x 
values. From each experiment a and 6 are estimated, and Y, for 
example the response for a particular dose, is calculated from the 
fitted line (12.2.2). The variation of the repeated a, 6, and Y values 
should follow normal distributions with means a, fi, and fx (see (12.2.3)), 
and variances va*[a\ vai[b], and va*[Y] say. Compare the non-normal 
distribution of parameter estimates found in the non-linear problem 
discussed in § 12.8. If Y is normally distributed confidence limits can 
be found as in § 7.4 once var[ > r ] is known. To find var[ 7] it is necessary 
to know the variances of a and 6. 



The variance of the estimated slope, b 

The least squares estimate (6) of the true slope {§) given in eqn. 
(12.2.7) can be written out term by term in the form 

b " Z(^=ij* = yi S(^ +ya L(^ + --- + ^2(^- (12 - 4 ' 1) 

Therefore 6 is a linear function of the observations, i.e. it can be written 
in the form Zcffl = ...+c N y N where the c, are constants 

(for more comments on this use the word 'linear* see § 12.7). In this case 
the constants are c, = (x f — x)fZ(x f — x) a . The variance of 6 now follows 



Copyrighted material 



226 Fining curves §12.4 

directly from (2.7.10) and is var[y].£c*. Now cf = {x,—z ) 2 /[Z(a:,-x) 2 ] 2 
so Ee 2 = £(x,— x) 2 /^*,— x) 2 ] 2 and therefore, cancelling, 

var[y] 

This gives a prediction of what the variance of repeated estimates of 6 
should be, based on the scatter of the observations, varfy], seen in 
the one experiment actually done (see §§ 2.7 and 7.2). Notice that the 
slope will be most accurately determined (var [6] smallest) when the 
values of z are widely spaced making the values of {x—x) large, as 
common sense suggests. 

Confidence limits for the slope can be obtained just as in § 7.4 
(because 6 is normally distributed when the observations are, see above) 
as 

b± V(var[6]), (12.4.3) 

where t is Student's t for the required P and with the number of d.f. 
associated with var[y]. See §§ 12.6 and 12.6 for examples. 

The variance of a 
By (12.2.6) a = £ so var[a] = vartf] = var[y]/tf , by (2.7.8). 

Confidence limits for the true {population) straight line 

The value of Y estimated from the line, Y = a+o(x-x), is a linear 
function of the observations because, as above, both a and b are. It 
will therefore be normally distributed when the observations are. 
The population mean value of Y at any given value of x is /* (see 
(12.2.3)), so the error of a value of Y is Y— p which has a population 
mean valuef of p — p = 0 and variance var[ Y] (because p is a constant). 
The variance of Y is 

var[F] = var[y+6(x-x)] (by (12.3.1)) 

= var[yj+var[&(*-*)] (by (2.7.3)) 

= var[y]+(x-x) 3 var[6] (by (2.7.5)) 

var[y] m varfy] 

t See Appendix 1 for a rigorous definition. E(y— /i] — E[y]— E[/*] = p — p = 0. 



Copyrighted material 



§ 12.4 



The relationship between two variables 227 



Notice that the use of (2.7.3) assumes that y and 6 are un correlated 
(i.e. in repeated experiments there will be no tendency for y to be 
large in experiments when 6 is). This has not been proved but a similar 
relationship is discussed in greater detail in §§ 13.8 and 13.10. See also 
§ 12.7. 

The confidence limits for pi (at a particular value of x), when it is 
estimated by calculating Y, are, by (7.4.2), 



Several points about (12.4.4) are worth noticing. First, although 
the term £(x,— x) 2 is a constant (depending on the particular values 
of a: chosen) for a given experiment, the term (x— x) 3 is not. The presence 
of this latter term shows that the variance of Y, unlike that of y, is 
dependent on the value of x. The variance of Y will be at a minimum 
when x — x because at this point the second term, which involves 
x—x, disappears leaving varJT] = var[y] as expected (because at 
this point Y — y, see § 12.3). It can also be seen that the variance of 
Y (and hence the width of the confidence limits) increases as x deviates 
in either direction from x, because the deviation (x — x) is squared and 
therefore always positive. 

The common sense of these results is discussed further when they 
are illustrated numerically in §§ 12.5 and 13.14 (and plotted in Figs. 
12.5.1 and 13.14.1). 

Confidence limits for new observations 

Just as in § 7.4 the situation is changed if instead of wishing to find 
confidence limits for, say, the population (true) response, pi, produced 
by a particular concentration of drug, x, it is wished to find confidence 
limits within which the mean {y m ) of m observations of the response to 
concentration x would be expected to lie. The best estimate of y m is 
the same as the best estimate of pi, viz. Y = a+b(x— x); but, as in 
§7.4, its error is different. The error of the prediction is Y— y m , 
which will be normally distributed with a population mean of pi— pi = 0. 
Because the m new observations are supposed to be independent 
observations from the same Gaussian population (2.7.3) can be used 
giving var[y — p m ] = var[y]-f var[y m ]. Now var(T] is given by (12.4.4), 
and, by (2.7.8), var[y m ] = var[y]/m, so, by exactly the same argument 
as used to find (7.4.3), the confidence limits for y m will be 



Y±tV(™r[Y]). 



(12.4.5) 




(12.4.6) 



Copyrighted material 



228 Fitting curves 



§ 12.4 



As expected (and as in § 7.4) this reduces to (12.4.5) when m in very 
large so y m becomes the same as /i. The prediction is that if repeated 
experiments are conducted and in each experiment the limits calculated, 
then in 95 per cent of experiments (or any other chosen proportion, 
depending on the value chosen for t) the mean of m new observations 
will fall within the limits. The limits, and $ m , will of course vary 
from experiment to experiment— see § 7.9. This prediction is, as 
usual, likely to be optimistic (see § 7.2). The use of this method is 
illustrated on § 13.14 (and plotted in Fig. 13.14.1). 

12.6. Fitting a straight line with one observation at each x value 

The results in Table 12.6. 1 show a single observation on the dependent 
variable, y, at each value of the independent variable, x. For example, 
y might be the plasma concentration of a drug at a precisely measured 
time x after administration. The common sense supposition that the 
times have not been chosen sensibly will be confirmed by the analysis. 
The assumptions necessary for the analysis have been discussed in 
§{ 11.2, 12.1, and 12.2, and the meaning of confidence limits has been 
discussed in §§7.2 and 7.9. These should be read before this section. 



Table 12.6.1 



X 


V 


100 


60 


166 


64 


160 


64 


176 


67 


180 


86 


188 


78 



Totals 1037 407 



There is a tendency for y to increase as x increases. Is this trend 
statistically significant ? To put the question more precisely, does the 
estimated slope of this line, 6, differ from zero to an extent that would 
be likely to occur by random experimental error if the true slope of the 
line, 0, were in fact zero ? In other words it is required to test the null 
hypothesis that (? = 0. 

Fitting the straight line Y = a+b(x-x) 
The least squares estimate of a in (12.2.3) is, by (12.2.6), 

a = g = 407/6 = 67-833. 



Copyrighted material 



§12.5 



The relationship between two variables 



229 



The least squares estimate of the slope, 0, is, by (12.2.8), 

b = X(y-9)(x-x)IX{x-x)*. 
First calculate, by (2.6.5), 

S(x-x) 2 = Lar 3 -^) 3 /^ = 1603+ 165 a + ...+ 188 a -1037*/6 = 526-833. 

The sura of products is found using (2.6.7), 

Xly-9)(z-x) = Zyx-Xy.lx/N 

= (59xl60)+... + (78xl88)-(407x 1037)/6 
- 511-833. 

Thus 6 - 511 833/526-833 = 0-9715. Also z = 1037/6 = 172-833. 

Inserting these values in (12.2.2) gives the equation for the least 
squares straight line. 

Y = 67-833+O-0715 (*-172-833) (12.5.1) 

This line is plotted in Fig. 12.5.1 together with the observed values. 

Does the estimated slope, 6 = 0-9715, differ from zero by more than 
could reasonably be expected if the population slope, /?, were zero ? 

The analysis of variance 

The analysis is performed with the observations {y values). The 
independent variable, x t only comes in incidentally. The principle of 
the method is described in § 12.3. 

The total sum of squares, by (2.6.5), is 

2(y-0) a = 2y _(2y>»/tf = 69 2 +... + 78 a - 407 a /6 = 682-833. 
The sum of squares due to linear regression, by (12.3.4), is 

(511-833) a /526-833 = 497-260. 

The sum of squares due to deviations from linearity is found, using 
(12.3.3), by difference, as 682-833-407-260 = 185-573. 

There are 6 values of y so the total number of degrees of freedom is 5. 
The sum of squared deviations (SSD) due to linear regression has one 
d.f. because it corresponds to the calculation of one statistic (6) from 
the observations (this is made obvious by the identity of the analysis 
with a t test, shown at the end of this section). The analysis summarized 
by (12.3.3) is tabulated in Table 12.5.2, which is completed and inter- 
preted in the same way as previous analyses of variance (e.g. Table 
1 1.3, 11.4, and 11.5). The two figures in the mean square column would 
be independent estimates of the same quantity (a 3 ) if all 6 observations 
were from a single population (with mean p and variance a 3 ). This way 



Copyrighted material 



230 Fitting curves 



§12.5 



of stating the null hypothesis implies that the population mean of the 
observations is always fi (whatever the x value), i.e. it implies that 
/? — 0, the way in which the null hypothesis was put above. The 
probability that the ratio of two independent estimates of the same 
variance will be 10-72 (as observed, Table 12.6.2), or larger, is 0 02 to 
0 05 (see §§ 11.3 and 11.4), i.e. 10-72 would be exceeded in something 




Fio. 12.5.1. Observed points from Table 12.5.1. 

Least squares estimate of straight line (eqn. (12.5.1). 

- - - 95 per cent confidence limits for F, i.e. for the fitted line, 
x Particular values of confidence limits calculated in the text. 

between 2 and 6 per cent of experiments in the long run (the limitations 
of the tables of F, see § 11.3, prevent P being found to any greater 
accuracy). 

In this analysis there is no good estimate of the experimental error, 
because only one observation was made at each value of x. This analysis 



Copyrighted material 



$12.5 



The relationship between two variables 



231 



should be compared with that in § 12.6, in which replication of the 
observations gives a proper estimate of o 2 . The best that can be done 
in this case is to assume that the line is straight, in which case the 
mean square for deviations from linearity, 46-393, will be an estimate of 



Table 12.6.2 



Source of 
variation 


d.f. 


88D 


MS 


F 


P 


Deviations from 
linearity 


1 

N-2 - 4 


497-260 
185-573 


497-260 
46-393 


10-72 


0 02-0 05 


Total 


N-\ = 6 


682-833 









the error variance (see § 12.6). Following this procedure shows that a 
value of b differing from zero by as much as or more than that observed 
(0-9715) would be expected in between 2 and 5 per cent of repeated 
experiments if p were zero. This suggests, though not very conclusively, 
that y really does increase with x (see § 6.1). 

Gaussian confidence limits for the population line 

The error variance of the 6 observations (the part of their variance 
not accounted for by a linear relationship with x) is estimated, from 
Table 12.2, to be var[y] = 46-393 with 4 d.f. The value of Student's 
t for P — 0-95 and 4 d.f. is 2-776 (from tables, see § 4.4). The confidence 
limits for the population value of Y at various values of x can be 
found from (12.4.5). To evaluate var[ Y] from (12.4.4) at each value of x t 
the values var[y] = 46-393, N = 6,x = 172-833 and I>(x—xf =526-833 
are needed. These have already been calculated and are the same at all 
values of x. Enough values of x must be used in (12.4.4) to allow smooth 
curves to be drawn (the broken lines in Fig. 12.5. 1). Three representative 
calculations are given. 

(a) x = 200. At this point, (by 12.5.1), the estimated value of Y is 

Y = 67-833+0-9715 (200-172-833) = 94-23 
and, by (12.4.4), 

/l (200-172-833) a \ 
= 46 393 (6 + «26.833 ) = 72 ' 72 ' 



Copyrighted material 



232 Fitting curves 



§12.5 



The Gaussian confidence limits for the population value of 
Y at x = 200 are thus, by (12.4.5), 94-23 ± 2-776 V(?2-72), i.e. 
from Y = 70-56 to Y = 117-90; these are plotted in Fig. 12.6.1 
at * = 200. 

(b) x = 172-833 = *. At the point (x-x) = 0 so, from (12.5.1), 
7 = 67-833 = 0. From (12.4.4) varfyj = var(y)(l/tf) = 46-393/6 
= 7-73, and the confidence limits for the population value of Y 
are, by (12.4.5), 67-833 ± 2-776^(7*73), i.e. from 6011 to 75-55. 

(c) x = 0. At this point, the intercept on the y axis, Y = 67-833 
+0-9715 (0-172-833) = -100 1. This is, of course, a considerable 
extrapolation beyond the range of the experimental results. From 
(12.4.4), 

/l (0-172-833) a \ 
varfy] = 46-393(- + 4^^) = 2638, 

which is far larger than when x is nearer x. The confidence limits are 
-100-1 ± 2-776V(2638), i.e. from -243 to +42-5. 

The confidence limits, are much wider at the ends than at the central 
portion of the curve which illustrates the grave uncertainty involved 
in extrapolation beyond the observations. Moreover it must be re- 
membered that these confidence limits assume that the population 
(true) line is really straight in the region of extrapolation. There is, of 
course no reason (from the evidence of this experiment) to assume this. 
In fact with only one observation at each x value linearity could not be 
tested even within the range of the observations. The uncertainty in 
the extrapolated intercept, Y = —100-1, at x = 0 is therefore really 
even greater than indicated by the very wide confidence limits which 
extend from —243 to + 42-5 (even apart from the further uncertainties 
discussed in § 7.2). The intercept does not differ 'significantly' from 
zero (or even from +40 or —240) 'at the P = 0-06 level'. 

Testing a hypothetical value with the t test 

As in § 9.4 the confidence limits can be interpreted as a t test, and 
this will make it clear that the (rather undesirable) expression 'not 
significant at the P = 0 05 level' means the result of the test is P 
> 0-05. For example to test the hypothesis that the population value of 
the intercept is p = +40, calculate, from (4.4.1), 

t = (Y-h)I^/v&t[Y] = (-100-1 -40)/V(2638) = -2-728 

with 4 degrees of freedom. Referring t = 2-728 to a table (see § 4.4) 
of Student's t distribution shows P > 0-05 (two tail; see § 6.1). 



Copyrighted material 



§ 12.5 The relationship between two variables 233 

The curvature of the confidence limits for the population line is 
only common sense because there is uncertainty in the value of a, 
i.e. in the vertical position of the line, as well as in 6, its slope. If lines 
with the steepest and shallowest reasonable slopes (confidence limits for 
/?) are drawn for the various reasonable values of a the area of uncer- 
tainty will have the outline shown by the broken lines in Fig. 12.6.1. 
Another numerical example (with unequal numbers of observations at 
each point) is worked out in § 13.14. 



Confidence limits for the slope. Identity of the analysis of variance with a 
t test 

In § 12.4 it was mentioned that the slope will be normally distributed 
if the observations are, with variance given by (12.4.2). In this example 
b = 0-9715 and var[6] — 46-393/626-833 = 0-08806. The 96 per oent 
confidence limits, using t = 2-776 as above, are thus, by (12.4.3), 
0- 97 lfi : i: 2-776^(0-08806), i.e. from 016 to 1-80. These limits do not 
include zero, indicating that b 'differs signficantly from zero at the 
P = 0 06 level'. 

As above, and as in § 9.4, this can be put as a t test. The Gaussian 
(normal) variable of interest is 6, and the hypothesis is that its popula- 
tion value (0) is zero so, by (4.4.1), 

b-0 0-9716-0 
t = — = = 3-274 

V Vftr tf>] v'0.08B06 

with 4 degrees of freedom (the number associated with var[y], from 
which var[6] was found). Referring to tables (see § 4.4) of Student's 
t distribution shows that the probability of a value of t as large as, or 
larger than, 3-274 occurring is between 0 02 and 0 06, as inferred from 
the confidence limits. 

It was mentioned in § 1 1.3 (and illustrated in § 1 1.4) that the variance 
ratio, F, with 1 d.f. for the numerator and / for the denominator is 
simply a value of t 2 with / degrees of freedom. In this case t 2 with 
4 d.f. = 3-274 3 = 10-72 = F with 1 d.f. for the numerator and 4 for the 
denominator. This is exactly the value of F found in Table 12.5.2 
(and P — 0-02— 0 05 exactly as in Table 12.5.2). It is easy to show 
that this t test is, in general, algebraically identical with the analysis 
of variance, so the component in the analysis of variance labelled 
'linear regression' is simply a test of the hypothesis that the population 
value of the straight line through the results is zero. This approach also, 



Copyrighted material 



234 Fitting curves § 12.5 

incidental) y , makes it clear why this component in the analysis of 
variance should have one degree of freedom. 

12.6. Fitting a straight line with several observations at each x 
value. The use of a linearizing transformation for an 
exponential curve and the error of the half-life 

The figures in Table 12.6.1 are the results of an experiment on the 
destruction of adrenaline by liver tissue in vitro. Three replicate 
determinations (n = 3) of adrenaline concentration (the dependent 
variable, y) were made at each of k = 6 times (the independent variable, 
x). The figures are based on the experiments of Bain and Batty (1966). 



Table 12.6.1 
Values of adrenaline {epinephrine) concentration, y (/*g/ml) 







Time, x (min) 






6 


18 


30 


42 


64 


Total 


300 


8-9 


4 1 


1-8 


0-8 




280 


80 


4-6 


2-6 


0-6 




28-6 


10 8 


4-7 


22 


10 




Tot&la 871 


27-7 


13-4 


6-8 


2 4 


137-2 



The decrease in adrenaline concentration with time plotted in Fig. 
12.6.1 is apparently not linear. Because there is more than one observa- 
tion at each point in this experiment it is possible to make an estimate 
of experimental error without assuming the true line to be straight 
(cf. § 12.6). Therefore it is possible to judge whether or not it is reasonable 
to attribute the observed deviations from linearity to experimental error. 
The assumptions of the analysis have been discussed in §§ 6.1, 7.2, 11.2, 
12.1, and 12.2 which should be read first. There are not enough results 
for any of the assumptions to be checked satisfactorily (see § § 4.6 and 1 1 .2). 

The basic analysis is exactly the same as the one way analysis of 
variance described in § 11.4, the 'treatments' in this case being the 
different x values (times). As in § 11.4, it is not necessary to have the 
same number of observations in each sample (at each time). If the three 
rows in Table 12.6.1 had corresponded to three blocks (e.g. if three 
different observers had been responsible for the observations in the 
first, second, and third rows) then the two-way analysis described in 
§ 11.6 would have been appropriate, with a between -rows (between 



Copyrighted material 



§ 12.6 The relationship between two variables 235 

blocks, between observers) component in the analysis of variance. The 
additional factor, compared with the one-way analysis in § 11.4, is 



e 

i 2«i 



t In 



5 
I 



© 



© 



18 



3<) 
Time x 



42 



54 



Fig. 12.6.1. Observed mean adrenaline concentration (y.,) plotted 
time (j). Data of Bain and Batty (1966) from Table 12.6.1. 

that part of the differences 'between treatments' (i.e. between the mean 
concentrations at the five different times) can be accounted for by a 
linear change of concentration with time (see § 12.3). 

Calculating the analysis of variance of y 

The first part is exactly as in § 11. 4 (where more details will be found). 

(1) Correction factor Q 2 jN = (137-2) a /15 = 1264-923. 

(2) Total sum of squares, from (2.6.6) (cf. (11.4.3) and refer to § 2.1 if 
you are confused by the notation). 

= 30-0 a + 8-9 a +...+ 2-2 a + l-0 a -1254-923 
= 1612 037. 

(3) Sum of squares (SSD) between columns (i.e. between the concen- 
trations at different times), by (11.4.5), is 



87- 1 2 27-7 a 2-4 2 
SSD between times = H — - — K..+ — • 



1254-923 = 1605-937. 



This SSD can be split into two components, just as in § 12.5. In this 
case the calculations could be made easier by transforming the 
independent variable (x), as shown at the end of this section. But, 
for generality, the full calculation will be given first. 



Copyrighted material 



236 Fitting curves §12.6 
(a) Sum of squares due to linear regression. This is found from ( 12.3.4) 



SSD = [i(y-y)(s-Wi(*-z)*. 



It is easy to make a mistake at this stage by supposing that there 
are only five x values, when in fact there are N = 15 values. This 
will be avoided if the N = 16 pairs of observations are written 
out in full, as in Table 12.6.1, rather than in the condensed form 
shown in Table 12.6.1. This is shown in Table 12.6.2. 

Table 12.6.2 



X 


V 


6 


300 


6 


286 


6 


28-6 


18 

« 


89 

• 


• 
« 

64 


• 
• 

0-8 


54 


06 


64 


10 



Totals 460 137-2 



Firstly find the sum of products using (2.6.7) and Table 12.6.2: 

» Mi „ * g*Hgy) 
2(y-9)(z-x) = 2*y ^~ 

= (6x30-0)+(6x28-6)+... 

(460)(137-2) 
+ (54xl-0)_^— ^ 

= -2286000. 

Notice that Ex = 3(6+ 18+ 30+42 + 64) = 460 (compare Table 
12.6.2), because each x occurs three times; and also that the 
calculation of the sum of products can be shortened by using the 
group totals from Table 12.6.1 giving 

(460)(137-2) 

(6x87.1)+(18x27-7)+...+(64x2-4)-^ 1 \- ' 

16 

= -2286000. (12.6.1) 



Copyrighted material 



f 12.6 The relationship between two variables 237 

Secondly, find the sum of square* for x. From (2.6.6) 
2 2 (Xx) a 450* 

450 2 



15 

= 4320000. (12.6.2) 

From (12.3.4) the SSD due to linear regression now follows 

(— 2286-00)* 

SSD = ionn "- - 1209-68. 
4320 000 

ib) SSD for deviations from linearity. As in § 12.4 this is most easily 
found by differenoe (cf. (12.3.3)) 

SSD due to deviations from linearity = SSD between z values — 

SSD due to linear regres- 
sion (12.6.3) 

= 1605-937 -1209-68 
= 396-26. 

(4) SSD for error. This is simply the within groups SSD of § 11.4. 
The experimental error is assessed from the scatter of replicate observa- 
tions of each x value. It has N—k — 15—5 — 10 degrees of freedom 

Table 12.6.3 
Gaussian analysis of variance of y 



Source 




d.f. 




SSD 


MS 


F 


P 


Linear regression 






1 


1209-68 


1209-68 


1983 


<0001 


Deviation* from 
















linearity 


Jfc- 


-2 = 


3 


396-26 


13209 


218-6 


<0001 


Between x value* 
















(times) 


k- 


-1 = 


4 


1606-937 


401-48 


658-2 


<0001 


Error (within x 














values (times)) 


N- 


-k = 


10 


8- 100 


0-8100 






Total 


N- 


-1 - 


14 


1612 037 









and is most easily found by differenoe as in § 11.4, thus 1612-037 
-160-937 = 6-100. 
/* These results can now be assembled in an analysis of variance table 
[ (Table 12.6.3) the bottom part resembling Table 11.4.2, the top part 

1605.937 



238 



FiUing curves 



§12.6 



resembling Table 12.5.2 (except that the number of different x values, k, 
is no longer the same as the total number of observations, N). 

The table is completed as described in Chapter 11 and § 12.5. Each 
mean square would be an estimate of a 2 if the null hypothesis that all 
15 observations came from a single normal population (with variance 
a 2 ) were true. The ratio of each mean square to the error mean square is 
referred to tables of the F ratio (see § 11.3), to see whether it is larger 
than could be expected by chance. Although a considerable part of the 
differences between the mean adrenaline concentrations at different 
times ('between times') is accounted for by a linear relationship between 
concentration (y) and time (x), the remainder ('deviations from linearity') 
is still much larger than could be reasonably expected if the true line 
were straight. P <^ 0-001, i.e. deviations from linearity as large as 
those observed, or larger, would ocour in far fewer than 1 in 1000 
repeated experiments if the true line were straight, and if the assump- 
tions about normal distributions, etc (see § 12.2), made in the calcula- 
tions are sufficiently nearly true. 

There are now two possibilities. Either a curve can be fitted directly 
to the observations (see §§12.7 and 12.8), or a transformation can be 
sought that converts the graph to a straight line. The latter approach is 
now described. 

A linearizing transformation. Does the catabolism of adrenaline follow an 
exponential time course ? 

If the rate of catabolism of adrenaline by liver tissue at any given 
moment were proportional to the concentration of adrenaline (y) 
present at that time {x) than the concentration of adrenaline (Table 
12.6.1) would be expected to fall exponentially, i.e. 



where y 0 is the concentration present at time x = 0 and k is the rate 
constant.f The reciprocal of k, the time constant, is the time taken 
for the concentration to fall to 100/e ~ 36-8 per cent of its original 
value (when x — ljk, it follows from (12.6.4) that y — y 0 / e )- Taking 
natural logarithms (logs to base e) of (12.6.4) gives 



t The symbol k has already been used for the number of treatment* (times), but 
there should be no risk of confusion between its two meanings. 



y = y«e 



(12.6.4) 



logj/ = \og# 0 -kx. 



(12.6.5) 



Copyrighted material 



§ 12.6 



The relationship between two variables 239 



(Remember the log is the power to which the base must be raised to 
give the argument, so log e e" kx = — lex.) Therefore there is a straight 
line relation between log y and x, with slope —k and intercept log y 0 . 
The half-life of adrenaline is related to the rate oonstant in a simple 
way. Putting y = y 0 /2 in (12.6.5) gives the half-life as 

log e 2 0-69315 

*o-6 =-7T = —7 (12.6.6) 



The, interpretation of the rate constant in molecular terms is discussed in 
M2.3. 

Common logarithms (to base 10) are more easily available than 
natural logarithms so it will be convenient to write (12.6.5) in terms of 
common logarithms. Dividing though by log e 10 fii 2-3026 gives, using 
(13.3.5), 

lex 

log 10 y = l °gioyo-^3^' (12.6.7) 

a straight line with slope = — &/2-3026 and an intercept log 10 y 0 . 

In order to do the following analysis it is necessary to assume that the 
values of log y at each x value are normally distributed (i.e. that y is 
lognormal, § 4.6) and homoscedastic (see § 12.2). These assumptions, 



Table 12.6.4 
Values of log l0 y found from Table 12.6.1 









Time, x (min) 






Total 




6 


18 


30 


42 


64 






1-4771 


00494 


0-6128 


0 2553 


-00969 






1-4664 


0-0031 


0-6828 


0-4160 


-0-2218 






1-4648 


1 0334 


0-6721 


0-3424 


00000 




Total 


4-3883 


2-8859 


1-9477 


10127 


-0-3187 


9-9159 



of course, contradict those just made in doing the analysis of variance 
of y (Table 12.6.3), when y itself was supposed normal and homo- 
scedastic (see the discussions of transformations in §§ 11.2 and 12.2; 
this problem would not arise if the transformation was made on the in- 
dependent variable z). There is no way of telling how likely it is that 
this contradiction will give rise to misleading inferences in particular 
cases. In the absence of real knowledge about the distributions of the 

17 



Copyrighted material 



240 Fitting curves 



§12.6 



observations the analysis will, as previously emphasized (see §§4.2, 
6.2, and 7.2), be in error to some unknown extent. If y were known to 
be normally distributed the methods of § 12.8 would be preferred to 
that now described. 

To see whether the straight line defined by (12.6.7) fits the observa- 
tions, the logarithms of the observations are tabulated in Table 12.6.4. 

Table 12.6.5 
Qau8sian analysis of variance of log x ^y 



Source 


d.f. 


BSD 


MS 


F 


P 


Linear regression 


1 


42467 


4-2467 


874-1 


<0001 


Deviations from linearity 


3 


00337 


00112 


2-31 


0-1-0-2 


Between times 


4 


4-2804 


1 0701 


220-3 


<0001 


Error 


10 


004868 


0-004868 






Total 


14 


4-3290 










Time x 

Fio. 12.6.2. Same data as Fig. 12.6.1, but the mean value of the log 10 
adrenaline concentration (from Table 12.8.4) is plotted against time. The line is 
that found by the method of least squares, eqn (12.6.12). 

The mean log concentrations are plotted against time in Fig. 12.6.2. 
The graph looks much straighter than Fig. 12.6.1. The analysis of 
variance of the log concentrations in Table 12.6.4 is now calculated in 
exactly the same way as the calculation of the analysis of variance of 



Copyrighted material 



§ 12.6 The relationship between two variables 241 

the concentration* themselves (Table 12.6.1). The result is Table 
12.6.6. Compare Table 12,6.3. 

The results in Table 12.6.5 show that almost all the variation of the 
log concentrations between times is accounted for by a straight line 
relationship between log y and time and the evidence against the null 
that the true slope of this line, p, is zero is very strong, 
from linearity as large as, or larger than, those observed 
would be expected to occur in between 10 and 20 per cent of repeated 
experiments if the true population line were straight (if the assumptions 
made are correct). There is, therefore, no compelling reason to believe 
that the relation between log y and time is non-linear, i.e. the experi- 
ment provides no evidence that eqn. (12.6.7), and hence also eqn. (12.6.4), 
fit the observations inadequately. In other words there is no reason 
to believe that the concentration of adrenaline does not decay 
exponentially. 

Having established that it is reasonable to fit a straight line to the 
log concentrations, the next step is to estimate the parameters (slope 
and intercept) of the line. 

Fitting the straight line 
If the log observations are denoted y\ i.e. 

y' = log 10 y, (12.6.8) 

the equation to be fitted (12.6.7) can be written as 

log 10 Y s Y' = a -\-b{x-x), (12.6.9) 

which has the same form as in previous examples in this ohapter. 
Using (12.2.6) the estimate of a is 

9-9159 

a = $' = — — = 0 66106. 
15 

To estimate the slope, the sum of products is first found as described 
in the analysis of untransformed concentrations (eqns. (12.6.1) and 
(2.6.7)) 

W -$')(*-*) - Say*- 1 Y* 

= (6x4-3883)+ (18x2-8869) + ... + (54x -0-3178)- 
(450)(9-9169) 
15 

= -135-446 (12.6.10) 



Copyrighted material 



242 Fitting curves §12.6 

This is negative because y' decreases with x (see § 2.6). The sum of 
squares for x is found as in eqn. (12.6.2), and is 4320-000 as before. The 
slope is therefore estimated, by (12.2.8), to be 

W-y')(x-z) -135-486 

6 - V-W = »o - - 0 03136 - <»*"> 

Putting these values, and x = 460/15 = 30 000 as before, into (12.6.9) 
gives the least squares estimate of the straight line as 

log 10 Y = 0-66106-0 03135 (x-30) 

= 1-6016-0-03135X. (12.6.12) 

Comparing this with (12.6.7) gives the estimates of the parameter as 

log 10 y 0 - 1*6016, so y 0 = 39-96 ^g/ml (12.6.13) 

and = -0 03135 so k = 0 07219 min" 1 (12.6.14) 

In its original form (12.6.4) the estimated regression equation is thus 

y = W9Ge- 0012lQl (12.6.15) 



The time oonstant (discussed above) for adrenaline catabolism is 
estimated to be 

1/4 = 1/0 07219 = 13-86 min, (12.6.16) 
and from (12.6.6) the half-life of adrenaline is 

0-69316 

*o.5 = — t — = 9-602 min. (12.6.17) 

It is shown in § A2.3 that k' 1 = 13-85 min can be interpreted as the 
mean lifetime of adrenaline molecules, and x 0 6 = 9-602 min can be 
interpreted as the median lifetime. 

Confidence limits for the half-life. From the analysis of variance 
(Table 12.6.5), the variance of the log observations is estimated as 
var[y'] = 0 004858 (the error mean square with 10 d.f.). Thus, using 
(12.4.2) and (12.6.2), the variance of the estimated slope (b in eqn. 
(12.6.9)) is var[6] = var[y']/2(ar-x) a = 0-004858/4320-000 = 1 126 
X lO" 6 . The value of Student's t for P = 0-96 and 10 d.f. (from tables, 
see § 4.4) is 2-228, so 95 per cent confidence limits for b follow 
from (12.4.3) as 6 ± ty/(va,r[b]) = -0-03135 ± 2-228V(M25x 10" e ) 
= - 0 03371 to —0-02899. The values for the half-life corresponding to 



Copyrighted material 



§12.6 



The relationship between two variables 243 



these values of 6 are now found as above (x 06 = 0-69315/( — 2-30266)). 
Because 2-3026 and 0-69315 are constants, not variables, no additional 
error enters in the conversion of 6 to x 06 . The 95 per cent Gaussian 
confidence limits for the true half-life are thus 8-930 to 10-38 min. As 
usual these limits can be interpreted as in § 7.9 only if all the assump- 
tions discussed in §§7.2, 11.2, and 12.2 are fulfilled. And, as usual, 
the limits are likely to be optimistio (see § 7.2). 

A simplifying transformation of the x values 

When, as in the present example, the x values are equally spaced and 
there is the same number of observations at each, the values of x can be 
transformed to make the arithmetic simpler. If x' is defined as x/12 — 2$ 
the scale becomes 



X 


6 


18 


30 


42 


64 




1 


l| 


H 


3* 


*i 


*' 


-2 


-1 


0 


+ 1 


+2 



Thus Zx' = -2-2-2-1-1-1+0+0+0+1 + 1 + 1+2+2+2 = 0, 
so f ' = 0. It follows that S(x'-x') 2 = Lx' a = 3(2 a +l 2 +0 a +l a +2 2 ) 
= 30 and X(y'-$')(x'-x') = I,y'(x'-x') = 'Zy'x' = (-2x4-3883)+ 
...+(+2x -0-3187) a -11-2872. These simplified calculations give, of 
course, the regression equation Y' = a+b{x'—x') = a-\-bx', the plot 
of log y against x'. The result is T = 0-661 1-0- 3762*'. Inserting the 
definition of x' gives Y' = 0-6611-0-3762 (x/12-2*) = 1-602- 
0-031 35x, exactly as above (eqn. (12.6.12)). 

12.7. Linearity, non-linearity, and the search for the optimum 

In real life most graphs are not straight lines. Sometimes, as in 
§ 12.6, they can be converted to lines that are near enough straight, but, 
as will be shown in § 12.8, this may be a hazardous process. Most 
elementary books do not discuss curves that are non-linear (in the 
sense to be defined in this section) because the mathematics is incon- 
venient to do by hand. Since most relationships that are based on some 
sort of physical model are non-linear, this is unfortunate. A simple 
computer method for fitting non-linear models will be given in § 12.8. 
Before this the principles of finding least squares estimates will be 
discussed, mainly in a pictorial way, and an attempt made to give an 
idea of the scope of linear (in the general sense) models. 



Copyrighted material 



244 Fitting curves §~12.7 

Finding least squares solutions. The geometrical meaning of the algebra 

In § 12.2 the least square estimates, d and S, of the parameters, a 
and /?, of the straight line (12.2.3) were found algebraically. (In this 
section, and in § 12.8, the symbols d and 6 will be used to distinguish 
least squares estimates from other possible estimates of the parameters.) 
It will be convenient to illustrate the approach to more complicated 
curves by first going into the case of the straighter line in greater 
detail. 

The intention is to find the values of the parameter estimates that 
make the sum of the squares of the deviations of the observations 
(y) from the calculated values (Y), S = X{y—Yf (eqns. (12.2.1) and 
(12.2.5)), as small as possible. Notice that during the estimation procedure 
the experimental observations are treated as constants (the particular 
observations made) and various possible values of the parameters are 
considered. The conventional way of finding a minimum, as in § 12.2, 
is to differentiate and equate to zero. How this works was illustrated in 
Fig. 12.2.2, in which 8 was plotted against various possible values for 
a(b being held constant). The slope of this graph (i.e. dSjda) is zero at 
the minimum, and the corresponding value of a is taken as the least 
squares estimate of a. The curly d, indicating partial differentiation 
means that 6 is treated as a constant when differentiating (12.2.5) to 
obtain (12.2.6). This means that b is given a fixed value which is 
inserted, along with the experimental observations (from Table 12.7.1) 
into (12.2.6) so that S can be calculated for various values of a, 
giving the curve plotted in Fig. 12.2.2. It may occur to you to ask 
whether the value at which 6 is held constant makes any difference to 
the estimate of a. In fact it does not because the expression found for 
dSjda did not involve b, and similarly the expression for dS/da did not 
involve a. The geometrical meaning of this will be made clear using the 
date in Table 12.7.1. 

Fitting a straight line in the form 

Y = a+b(x-x) (12.7.1) 
gives Y = 8-000+2.107(x-l-0), (12.7.2) 

which is plotted in Fig. 12.7.1. The calculations and interpretation are 
the same as for the example in § 12.5. The corresponding analysis of 
variance in Table 12.7.2 shows that, if the true line were straight and the 
assumptions described in § 12.2 were true, then the slope of this line is 



Copyrighted material 



The relationship between two variables 245 



Tablb 12.7.1 



v 



-2 1 

— 1 4 

0 fl 

1 9 

2 11 
S 10 
4 15 



Total 7 66 

Mean 10 80 




Pio. 12,7.1. A straight line (eqn. (12.7.2)) fitted by the method of least 

squares to the data in Table 12.7.1. 



Copyrighted material 



246 Fining curves §12.7 

greater than could be reasonably expected if the population slope {f}) 
were zero. 

The least squares estimates given in (12.7.2) are & = y = 8-000 and 
b = 2- 107, calculated from eqns. (12.2.6) and (12.2.7). If the values of a: 
and y from Table 12.7.1 are inserted in the expression for the sum of 



Table 12.7.2 





Source 


<Lf. 


8S 


MS 


F 


P 


Linear i 
Deviate 


ins from linear! 


1 

ty 5 


124-321 
7 679 


124-321 
1-636 


80-95 


<0001 


Total 




6 


132000 









squared deviations, 8, (eqn. (12.2.5)), then 8 can be calculated for 
various possible values of a and 6. There are three variables here so 
the results must be plotted as a three-dimensional graph. The most 
convenient way to represent this on two dimensional paper is to plot a 
contour map, the contours representing values of 8 (the height), i.e. 
the a- and ft-axes are in the plane of the paper and the £-axis is sticking 
up perpendicular to the paper. The result, calculated for the results in 
Table 12.7.1 using eqn. (12.2.6), is shown in Fig. 12.7.2. The graph is 
seen to represent a valley with elliptical contours. The bottommost 
point of the valley corresponds to d = 8-000 and b = 2-107, i.e. the 
least squares estimates already found. 

It can now be shown why the value of 6 used in constructing Fig. 
12.2.2 did not matter. In Fig. 12.7.3 sections across the valley are shown 
for b = b = 2- 107, b — 3-0 and b =- 3-4. The lines along which these 
sections have been taken are shown in Fig. 12.7.2. It can be seen that 
wherever the section is taken (i.e. whatever value 6 is held constant at), 
the minimum in the curve occurs at the same place, viz. at a — & 
= 8*00. Similarly, if sections across the valley are taken at various 
fixed a values (i.e. at 90° to the sections illustrated), each section will 
give a plot of 8 against b, with slope BS/db. The minima (dSfdb = 0) of 
these curves will clearly all be at 6 = b = 2- 107 whatever the value of a. 
Clearly this independence of a and b arises because the axes of the 
ellipses in Fig. 12.7.2 are at right angles to the coordinates of the 
graph (the ellipses are said to be in canonical form). 

The fundamental form of the straight-line equation is ft = a'+#r, 



Copyrighted material 



§12.7 



The relationship between two variables 247 



taken along these lines 




3 34 4 
Values of 6 

Fiq. 12.7.2. Contour map of the sum of squared deviations, S (on an axis 
perpendicular to the paper), plotted against various values of a and b using eqn. 
(12.2.5) and the data in Table 12.7.1 (plotted in Fig. 12.7.1). The contours for a 
straight line fitted in the form F = o+6(x— x) always have this appearance. 
Values of S are marked on the contours. The minimum value of S, at the bottom 
of the valley, gives the least squares estimates as d — 8-000 and B = 2-107. 
Sections across the valley, along the lines Bhown, are plotted in Fig. 12.7.3. The 
lowest point on each line (minima in Fig. 12.7.3) is marked X . 



Copyrighted material 



248 Fitting curves § 12.7 

where a' is the intercept, fi the slope, and ft the population value of y. 
Inserting the estimates of the parameters gives 

Y = &'+hx (12.7.3) 

and, because (12.7.2) can be written as 

Y = 6-893+ 2- 107x, (12.7.4) 




oi 1 1 1 1 1 1 1 1 

4 5 6 7 8 9 10 U IS 

'a =8 00 Values of a 

Fig . 12.7.3. Sections across the valley along the lines indicated in Fig. 12.7.2. 
The slope of the line, dSjda, is zero when S is at a minimum, as shown in Fig. 
12.2.2. The value of S at the hottom of the valley is 7 -679 as shown, and as found 

in Table 12.7.2. 

it is seen that &' = 6-893 and 6 = 2-107. Comparison of (12.7.1) and 
(12.7.3) shows that in general, as in § 12.2, 

d' = d-hx. (12.7.6) 

It may be asked why (12.7.4) was arrived at indirectly, through the 
seemingly more complicated form, (12.7.2). Why not apply the method 
of least squares directly to (12.7.3)? The answer to this will become 
clear when it is tried. The method of least squares will now be applied 



Copyrighted material 



§•12.7 The relationship between two variables 249 

to the straight line in the form of (12.7.3), in just the same way as it 
was applied in § 12.2 to the straight line in the form of (12.7.1). 

Denoting the observations y and the values calculated from (12.7.3) 
as Y, as in § 12.2, gives the sura of squared deviations, which is to be 



-S = S(y-y) a = Z(y-a'-6x) a 
= S(y 3 +a' a +6 a i 3 -2a / y-2y6ar-|-2a'6i) 
= Sy a +iSTa' a 4-2> a £« 3 -2a'2y-265:yx+2a'6Lx. (12.7.6) 

This is analogous to (12.2.6), but notice that this time the last term is 
not zero. As in § 12.2, 8 is differentiated with respect to a', treating 6 as a 
constant, giving 

as 

— = ZNa'-ZLy+WLx, (12.7.7) 

and equating this to zero to find the value of a' for which S is a minimum 
(see Fig. 12.7.5) gives 

Na'+blx = Ey. (12.7.8) 

The value of a' for which 8 is a minimum is no longer independent of 6, 
as shown by the presence of b in (12.7.8), the solution of which will 
depend on the value of 6 chosen. 

Differentiating (12.7.6) with respect to 6, holding o' constant, gives 

38 

— = 26£r a -2Zyx+2a'£r, (12.7.9) 

GO 

and again equating to zero gives 

a'Sx-f 6£r a = £yx. (12.7.10) 

Again unlike the result in § 12.2, the estimate of 6 is seen to depend on 
the value of a'. 

The required solution for &' and b' are those for which (12.7.8) and 
(12.7.10) are both true simultaneously. In fact, (12.7.8) and (12.7.10) 
are a pair of (linear) simultaneous equations (known, in regression 
analysis, as the normal equations), which can be solved for a' and b 
by school-book methods giving (with the values of x and y in Table 
12.8) a" = 5-893 and b = 2107 as found above. 

What is the geometrical meaning of these results ? If contours are 
plotted from (12.7.6) (using the data in Table 12.7.1) the results are as 
shown in Fig. 12.7.4. 



Copyrighted material 



250 Fitting curves 



§12.7 



The contours are still elliptical, but their axes are no longer parallel 
with the coordinates of the graph. When sections are made across the 
valley at the values of 6 shown in Fig. 12.7.4, the results are as shown in 
Fig. 12.7.5. 



Sections taken 
along these lines 




Values of 6 



Fio. 12.7.4. Contour maps of 5 (values marked on the contours) for same 
data as Fig. 12.7.2, but straight line fitted in the form 7 = a'+6x. Sections 
across the valley, along the lines shown, are plotted in Fig. 12.7.6. The lowest 
point along each line (minima in Fig. 12.7.5) is marked X . 

The value of o' for which 8 is a minimum is seen to depend on the value 
at which b was held constant when making the section across the valley, 
as expected from (12.7.8). Of course, the slope of the curves in Fig. 



Copyrighted material 



§ 12.7 The relationship between two variables 251 

12.7.5, dSjda', is zero at the minimum of each curve. But the only 
point at which dSjdb is simultaneously zero is at the bottommost 
point of the valley in Fig. 12.7.4 (hence the simultaneous equations). 
For example, on the curve for b = 3-4, in Fig. 12.7.6, S is at a minimum 
(i.e. dS/da' = 0) at the point a' = 4-6. Inspection of Fig. 12.7.4 makes 
it clear that if a section is made across the valley (at 90° to the sections 




Fio. 12.7.6. Sections across the valley, along the lines shown in Fig. 12.7.4, 
when a straight line is fitted in the form Y = a'+bz. The value of S at the bottom 

of the valley is 7*679 as before. 

in Fig. 12.7.5) at a' — 4-6, giving a plot of S against 6 (with slope 
= dSjdb), the minimum will not be at 6 = 3-4. That is to say, at the 
point a' = 4-6, 6 = 3-4, d8jda' is zero but dSjdb is not. 

It is now clear that the effect of writing the straight line in the form 
Y = a+b{x— x), is to make the estimates d and h independent of each 
other so two simple independent equations (derived in § 12.2) can be 
used for their estimation. If the line is written in the form Y = a'-\-bx, 
then the estimates are no longer independent, but must be found by 
solving simultaneous equations. 



Copyrighted material 



262 Fitting curves 



What does linear mean ? 

The term linear, as usually baaed by statisticians, embraces more 
than the simple straight line. It includes any relationship of the form 

Y = a+&* 1 +cx a +<ix 3 +..., (12.7.11) 

where x lt x 2 , x 3 , . . . are independent variables (see § 12.1; examples 
are given below), and a, b,c,d,. . . are estimates of parameters. This 
relationship includes, as a special case, the straight line (Y = a+6x), 
which has already been discussed at length. Equation (12.7.11) is des- 
cribed as a multiple linear regression equation (the 'linear' bit is, 
sad to say, often omitted). As well as describing straight line 
relationships for several variables {x L , x 2 , . . .), (12.7.11) also includes, 
for example, the parabola (or second degree polynomial, or quadratic), 
Y — a+bx+cx 2 , as the special case in which x a is the square of x v 
(As discussed in § 12.1, an 'independent variable' in the regression 
sense is simply one the value of which can be fixed precisely by the 
experimenter; it does not matter that in this case x x and x 2 are not 
independent in the sense of §§ 2.4 and 2.7 since their co variance is not 
zero. All that is required is that the values of x lt x a , . . .be known 
precisely.) The parabola is not a straight line of course, but it is linear 
in the sense that Y is a linear function (p. 39) of the parameters if the x 
values are regarded as constants (they are fixed when the experiment is 
designed). This is the sense in which 'linear' is usually used by the 
statisticians. It turns out that for (12.7.11) in general (and therefore 
for the parabola), the estimates of the parameters are linear functions 
of the observations. This has already been shown in the case of the 
straight line for which d = y, and for which h has also been shown 
(eqn. (12.4.1)) to be a linear function of the observations. This means 
that the parameter estimates will be normally distributed if the 
observations are, and the standard deviations of the estimates can be 
found using (2.7.11). Also, if the parameter estimates are normally 
distributed, it is a simple matter to interpret their standard deviations 
in terms of significance tests or confidence limits. Furthermore, linear 
problems (including polynomials) give rise to linear simultaneous 
equations (like (12.7.8) and (12.7.10)) which are relatively easy to 
solve (cf. §12.8). They can be handled by the very elegant branch of 
mathematics known as matrix algebra, or linear algebra (see, for example, 
Searle (1966), if you want to know more about this). It is doubtless 
partly the aesthetic pleasure to be found in deriving analytical solutions 



Copyrighted material 



§ 12.7 



The relationship between two variables 253 



in terms of matrix algebra that has accounted for the statistical litera- 
ture being heavily dominated by empirical linear models with no 
physical basis, and, much more dangerous, the widespread availability 
of computer programs for fitting such models by people who do not 
always understand their limitations (some of which are mentioned 
below). 

Polynomial curves 

It does not change the nature of the problem if some x values in 
(12.7.1) are powers of the others. Thus the general polynomial regression 
equation 

Y = a+bx+cx 3 +dx? + ... (12.7.12) 

is still a linear statistical problem. Increasingly complex shapes can 
be described by (12.7.12) by including higher powers of x. The highest 
power of x is called the degree of the polynomial, so a straight line is a 
first degree polynomial, the parabola is a second degree polynomial, 
the cubic equation, Y = a+6a^-f-dx 3 , is a third degree polynomial, 
and so on. Just as a straight line can always be found to pass exactly 
through any two points, it can be shown that pth degree polynomial 
can always be found that will pass exactly through any specified p + 1 
points. Because of the linear nature of the problem discussed above, 
polynomials are relatively easy to fit (especially if the x values are 
equally spaced). Methods are given in many textbooks (e.g. Snedecor 
and Cochran (1967, pp. 349-58 and Chapter 13); Williams (1959, 
Chapter 3); Goulden (1952, Chapter 10); Brownlee (1965, Chapter 13); 
and Draper and Smith (1966)) and will therefore not be repeated here. 

Although polynomials are the only sort of curves described in most 
elementary books, they are, unfortunately, not of much interest to 
experimenters in most fields. In most cases the reason for fitting a 
curve is to estimate the values of the parameters in an equation based 
on a physical model for the process being studies (for example the 
Michaelis-Menten equation in biochemistry, which is discussed in 
§ 12.8). Very few physical models give rise to polynomials, which are 
therefore mainly used in a completely empirical way. In most situations 
nothing more is learned by fitting an empirical curve (the parameters of 
which have no physical meaning), than could be made obvious by drawing a 
curve by eye. One possible exception is when the line is to be used for 
prediction, for example a calibration curve, and an estimate of error is 



254 Fitting curves 



§12.7 



required for the prediction — see § 13.1 4. In this case a polynomial 
curve might be useful if the observed line was not straight. 

Multiple linear regression 

If, as is usually the case, the observation depends on several different 
variables, it might be thought desirable to find an equation to describe 
this dependence. For example if the response of an isolated tissue 
depended on the concentration of drug given (x lf say), and also on the 
concentration of calcium (s a , say) present in the medium in which the 
tissue was immersed, then the response, Y, might be described by a 
multiple linear regression equation like (12.7.11), i.e. 

Y = a+tej+csa. (12.7.13) 

This implies that the relationship between response and drug concentra- 
tion is a straight line at any given calcium concentration, and the 
relationship between response and calcium concentration is a straight 
line at any given drug concentration (so the three-dimensional graph of 
Y against Xj and x 2 is a flat plane). As already explained x x could be 
the log of the drug concentration, and x 2 could similarly be some 
transformation of the calcium concentration, the transformation being 
chosen so that (12.7.13) describes the observations with sufficient 
accuracy. Even so the linear nature of (12.7.12) is a considerable 
restriction on its usefulness. Furthermore all the assumptions described 
in § 12.2 are still necessary here. The process of fitting multiple linear 
regression equations is described, for example, by Snedecor and Cochran 
(1967, Chapter 13); Williams (1959, Chapter 3); Goulden (1952, 
Chapter 8); Brownlee (1965, Chapter 13); and Draper and Smith 
(1966). 

The really serious hazards of multiple linear regression arise when the 
x values are not really independent variables in the regression sense 
(see § 12.1), i.e. when they are not fixed precisely by the experimenter, 
but are just observations of some variable thought to be related to Y $ 
the variable of interest. Data of this sort always used to be analysed 
using the correlation methods described in § 12.9, but are now very 
often dealt with using multiple regression methods. There is much to be 
said for this as long as it is remembered that however the results are 
analysed it is impossible to infer causal relationships from them (see 
also §§ 1.2 and 12.9). 

Consider the following example (which is inspired by one discussed 
by Mainland (1963, p. 322)). It is required to study the number of 



Copyrighted material 



§ 12.7 



The relationship between two variables 255 



working days lost though illness per 1000 of population in various 
areas of a large oity. Call this number y. It is thought that this may 
depend on the number of doctors per 1000 population (xj in the 
area and the level of prosperity (say mean income, z 2 ) of the area. 
Values of y, x lt and x 2 are found by observations on a number of areas 
and an equation of the form of (12.7.13) is fitted to the results. Even 
supposing (and it is not a very plausible assumption) that such complex 
results can be described adequately by a linear relationship, and that 
the other assumptions (§ 12.2) are fulfilled, the result of such an 
exercize is very difficult to interpret. Suppose it were found that areas 
with more doctors (a^) had fewer working days lost through illness (y). 
(If (12.7.13) were to fit the observations this would have to be true 
whatever the prosperity of the area.) This would imply that the co- 
efficient 6 must be negative. Suppose it were also found that areas with 
high incomes had few working days lost through illness (whatever 
number of doctors were present in the area), so the coefficient c is also 
negative. Inserting the values of a, b t and c found from the data into 
(12.7.13) gives the required multiple regression equation. If x x in this 
equation is increased y will decrease (because 6 is negative). If x a is 
increased y will decrease (because c is negative). It might therefore be 
inferred (and often is) that if more doctors were induced to go to an 
area (increasing a^), the number of working days lost (y) would decrease. 
This inference implies that it is believed that the presence of a large 
number of doctors is the cause of the low number of working days lost, 
and the data provide no evidence for this at all. Whatever happens in 
the equation, it is clear that in real life one still has no idea whatsoever 
what will happen if doctors go to an area. The number of working days 
lost might indeed decrease, but it might equally well increase. For 
example, it might be that doctors are attracted to areas of the city 
which are near to the large teaching hospitals, and that these areas also 
tend to be more prosperous. It is quite likely, then that most people in 
these areas will do office jobs which do not involve much health hazard, 
and this might be the real cause of the small number of working days 
lost in such areas. Conversely, less prosperous areas, away from teaching 
hospitals, where many people work at industrial jobs with a high 
health hazard, (and where, therefore, many working days are lost 
through illness) attract fewer doctors. If the occupational health hazard 
were the really important factor then inducing more doctors to go to 
an area might, far from decreasing the number of working days lost 
according to the naive interpretation of the regression equation, might 

Bl 



m 

Copyrighted material 



256 Fitting curves 



§12.7 



actually increase the number lost, because the occupational health 
hazards would be unchanged, and the larger number of doctors might 
increase the proportion of oases of occupational disease that were 
diagnosed. Similarly it cannot be predicted what effect in the change in 
the prosperity of an area will have on the number of working days lost. 
The regression equation describes (at most) only the statio situation at 
the time the survey was made and says nothing at aU about what would 
happen if the x values were changed. 

Clearly a correlation or regression relationship based on static 
survey data of this sort, in which the x values are correlated with 
eaoh other and with other variables that have not been included in the 
regression equation because they have not been thought of or cannot 
be measured, is the sort of thing for which it is possible to think up 
half-a-dozen plausible explanations before breakfast. The only use of 
the results, apart from describing, if you are lucky, the situation as it 
was when the survey was done, is to provide hints about what sort of 
proper experiments might be worth-while. The only way to find out 
what effect increasing the number of doctors in an area has is to increase 
the number. In a proper experiment various numbers of doctors would 
be allocated striotly at random (see § 2.3) to the areas being tested. 
This point has been discussed already in Chapter 1. 

Further discussions will be found in § 12.9, and in Mainland (1963, 
p. 322). More quantitative descriptions of the hazards of multiple 
regression will be found in Tukey (1954), Snedecor and Coohran (1967, 
pp. 393-400), and Brownlee (1965, pp. 452-4). 

Linear models and the analysis of variance 

It is worth mentioning in passing that the analysis of variance can 
be written in the form of a multiple linear regression problem. Consider 
for example the comparison of two treatments on two independent 
samples of test objects (the problem discussed at length in Chapter 9). 
It was pointed out in § 11.2 that in doing the analysis based on Gaussian 
(normal) distribution (i.e. Student's t test in the case of two sample* — 
see § 9.4) it is assumed that the tth observation on the first treatment 
can be represented as y n = u+t x + e a , and for the seoond treatment 
y<a = ^-f T 2+ g <2 ( eo , n (H-2.1)), where p is a constant, and r± and t 2 are 
constants characteristic of the first and second treatments respectively. 
This model can be written in the form of a multiple linear regression 
equation 

Va = J*+ *-2*a+«</. (12.7.14) 



Copyrighted material 



The relationship between two variables 257 



where x 1 is defined to have the value 1 for all responses to treatment 
1 (j = 1) and 0 for all responses to treatment 2(j = 2), and x a is 1 for 
responses to treatment 2, and 0 for responses to treatment 1. Inserting 
these values (12.7.14) reduces to y a = /i+fj-f e tl for treatment 1, 
and to y ta = ;* • r a • t i2 for treatment 2, exaotly as in § 11.2 If the 
estimates of Tj and r a from the data are called b and c, and estimate of p 
is called a, the estimated value for ith response to the jth treatment 
becomes Y — a+6x 1 4-cx a , identical with (12.7.13). The estimation of 
treatment effects (values of r) is the same problem as the estimation of 
the regression coefficients. An intermediate level discussion of this 
approach will be found in the first (1960) edition of Brownlee's (1965) 
book. 



12.8. Non-linear curve fitting and the meaning of 'beet' 
estimate 

For the purposes of illustration, the problem of fitting the Michaelis- 
Menten hyperbolaf (or Clark equation, or Langmuir isotherm) will be 
discussed. In bioohemioal terms the equation states that the velocity 
of an enzyme catalysed reaction is f xj{ Jf-f x) where x is the concentra- 
tion of substrate (the independent variable in the sense of § 12.1, and 
the parameters true or population values) of the equation are f (the 
maximum velocity, approached as z->oo), and Jf, the Michaelis con- 
stant (the substrate concentration necessary for half-maximum velocity ; 
if Jf — x the velocity is J2). The observed velocity, y say (the de- 
pendent variable, see § 12.1), will differ from this by some error. If 
V and K are estimates from experimental results, of "V and Jf , the 
estimated velocity of the reaction will be 




(12.8.1) 



The shape of this curve is shown in Fig. 12.8.1. 

Notice that the parameters, "f" and Jf", are not linearly related to Y 
so this is a non-linear problem in the sense defined in § 12.7. 

There are many ways of estimating *f and Jf. The relative merits 
of some of them will be considered below. First, the problem of finding 
least squares estimates for non-linear models will be discussed. The 

t The general formula for a hyperbola ia (Y— c,)(x— c a ) — constant, where the 
constant* e x and c, are the asymptote* of the hyperbola. If c t = V and c a = — K 
rearranging (12.8.1) shows that {Y— V){z + K) = —VK = constant, which has the same 
form as the general formula. 



Copyrighted material 



258 Fitting curves 



§12.8 



problem of estimating the error of these estimates from the experimental 
results is important but complicated, and it will not be considered here 
(see Draper and Smith 1966). Oliver (1970) has given formulas for 
calculating the asymptotio variances of V and K from scatter of the 
observations s(y). If there were several observations (y values) at each 
x then a{y) would be estimated from the scatter of these values 'within 



True (population) hyperbola (known only because the 

'observations' were computer-simulated, not reaL See 
discussion on p. 268). The population standard deviation is 
c(y) = l'O at all z values. 

- • - Least squares estimate of population line found from 
'observed' values. 

Lineweaver-Burk (LB, or double reciprocal plot) estimate 

of population line found from the same 'observations'. 



The true values, Y and Jf, and their values estimated by the two methods 
(from Table 12.8.4) are marked on the graph. 

x values', but in the following example where there is only one observa- 
tion at each x, the best that can be done is to assume the population 
curve follows (12.8.1), in which case the sum of squares of deviations 
from the fitted curve, S mla will be an estimate of ^(y). This is exactly 
like the situation for a straight line discussed in § 12.6. The formulas 
involve the population values 'f and X for which the experimental 



30- 



V (least square*) 

y (true) » 




-i — i : i i l_ 

2 5 5 10 20 40 



Substrate concentration, x 
FlQ. 12.8.1. Pitting the Michaelis-Menten hyperbola. 

O 'Observed' values from Table 12.8.1 



Copyrighted material 



§12.8 



The relationship between two variables 269 



values V and K must be substituted. No allowance is made for the 
uncertainty resulting from the use of sample values V, K, and s{y) in 
plaoe of population values X, and a{y) so the formulas are to some 
extent optimistic. Using them is just like using the normal deviate u 
instead of Student's i (see §§ 4.3 and 4.4). 

Least squares estimates for non-linear models 

The approach is exactly as in § 12.7. It is required to find the esti- 
mates of the parameters that minimize the sum of the squares of 
deviations between observed (y) and calculated [Y) velocities, S 
= Z(y—Y) a . In this example these least squares estimates will be 
denoted t and as in § 12.7. 

From (12.8.1), 



If, as in § 12.7, this expression is differentiated first with respeot to V 
holding K constant (giving BSjBV), and then with respeot to K holding 
V constant (giving BS/BK), and the two derivatives equated to zero, 
the result is a pair of simultaneous equations (the normal equations) 
that can be solved for P and &, just as (12.7.8) and (12.7.10) could be 
solved for d' and b in § 12.7. The only snag is that in this case they are 
non-linear simultaneous equations that cannot be solved by school- 
book methods. Another difficulty is that there may well be (as in this 
example) more than one set of solutions. The sort of difficulty that may 
be encountered can be illustrated using a numerical example. The 
figures in Table 12.8.1 represent the results of an enzyme kinetic 
'experiment'. 

Using these figures and eqn. (12.8.2), contours for various 8 values 
can be calculated and plotted against V and K as shown in Fig. 12.8.2 



The contours are not simple ellipses like those found in § 12.7 
(Fig. 12.7.2 and 12.7.4). The required solution is clearly the bottom- 
most point of the valley in Fig. 12.8.2(a) (where dSjdV and dSjdK 
are simultaneously zero, see § 12.7), and it can be seen that this point 






(12.8.2) 



(a) and (b). 



Copyrighted material 



260 Fitting curves 



§ 12.8 



Table 12.8.1 

Results of an enzyme kinetic experiment. The population {true) velocities 
are also given. They art known only because the 'experiment' was not 
real, but was simulated on a computer, as discussed later in this section. 



Substrate 


'Observed' 


Population 




i velocity 1 


mean velocity 




(y) 




1*8 


6-678 


4-2867 


60 


7-282 


7-6000 


100 


12-621 


12 0000 


200 


16- 138 


17 1420 


400 


23-210 


21-8182 




Fig . 12.8.2(a) Fitting the Mlchaelia-Menten hyperbola. Contour map of the 
sum of squared deviations, 8 (on an axis perpendicular to the paper), against 
various values of K and V. This figure is analogous to Figs. 12.7.2 and 12.7.4 
which referred to the fitting of a straight line. The values of 8, calculated from 
eqn. (12.8.2) using the observations in Table 12.8.1, are marked on the contours, 
(a) This covers the (physically important) positive values of V and K. The 
minimum value of S, 4-323, at the bottom of the valley corresponds to the least 
square estimates f - 31-46 and R = 16-89. 



Copyrighted material 




Fio. 12.8.2(b) Thin shows the contour map in the region of negative 
(physically Impossible) K values. There is seen to be a eubminimum at V = 3-244 
and K — —3.793, but this corresponds to 8 = 764-3, a far worse fit than the lowest 
minimum. 8 = 4.323. 

The contours marked 1040 are actually for 8 m E* 8 = 1040-4726. For values 
of 8 equal to or greater than this, the contour lines behave curiously. 



corresponds to least squares estimates of ? = 31-45 and ft = 15-89 
at the minimum value, S = 4-323. But the contours behave in a 
curious and complicated fashion in the region of negative K values 
shown in Fig. 12.8.2. There are infinitely high ridges at £ = -2-5, 
Jf = -5, etc., because at these points K+x in eqn. (12.8.2) becomes 

0(> 1040-4726 = Ey 3 ) can be seen in Fig. 12.8.2(b). The contours all 
cross each other, and cross the infinitely high ridges at K = —2-5, 
-6 0, etc. The points of intersection of the contours have curious 
properties. The height, i.e. the value of S, at these points depends on 
the direction from which the points are approached and, although 
anyone who has climbed a mountain will feel that this fact is not 
surprising, topographers might think that it was a warning against 
pushing the geographical analogy too far. 



262 Fitting curves § 12.8 

There are, in fact, several solutions to the simultaneous 'normal 
equations' in this case.f For example, there is another pit at the 
point V = 3-244 and K = -3-793, shown in Fig. 12.8.2(b). Although 
these values correspond to a minimum in S, the minimum is merely a 
hollow in the mountain side. The value of 8 at this minimum, 764-3, is far 
greater than the value of S at the least of all the minimums, 4-323, as 
shown at the bottom of the valley in Fig. 12.8.2(a). If there are several 
minimums that with the smallest S, i.e. the best fitting ourve, corres- 
ponds to the least squares estimates. In this case (though not necessarily 
in all problems) all of the subminimums correspond to negative values 
of K that are physically impossible and can therefore be ruled out. 

There are many methods of finding the least squares solutions (see, 
for example, Draper and Smith (1966, Chapter 10), Wilde (1964)). 
In almost all non-linear problems the solution involves successive 
approximations (iteration). The procedure is to make a guess at the 
solution and then to apply a method for correcting the guess to bring 
it nearer to the correct solution. The method is applied repeatedly 
until further corrections make no important difference. The final 
solution should, of course, be independent of the initial guess. Geo- 
metrically, the initial guess corresponds to some point on Fig. 12.8.2 
(say V = 10, K m 2 for example). The mathematical procedure is 
intended to proceed by steps down the valley until it reaohes (sufficiently 
nearly) the bottom, which corresponds to V = V and K = R. One 
method, whioh sounds intuitively pleasing is to follow the directum of 
steepest descent (which is perpendicular to the contours) from the initial 
guess point to the minimum. However inspection of Fig. 12.8.2 shows 
that the direction of steepest descent often points nowhere near the 
minimum. Furthermore if the search for the minimum is started in the 
precipitous terrain shown in Fig. 12.8.2(b), or if this region is reaohed 
at some time during the search, the direction of steepest decent may be 
completely misleading. Although this and other sophisticated methods 
(see, e.g Draper and Smith (1966, Chapter 10)) have had muoh success, 
many people now favour simpler search methods whioh seem to be 
rather more robust (see Wilde 1964). One such method whioh has proved 
useful for ourve fitting (Hooke and Jeeves 1961 ; Wilde 1964; Colquhoun, 
1968, 1969) will now be described. 

f There is aJeo, in general, the possibility of a saddle point or mountain pass when a 
minimum in the plot of S against one parameter coincide* with a maximum in the plot 
of S against the other. Such a point also satisfies the normal equations because both 
derivatives are sero. 



Copyrighted material 



§12.8 



The relationship between two variables 263 



Patternsearch minimization 
In Table 12.8.2 a computer program (an Algol 60 procedure) is 

Tablb 12.8.2 

Patternsearch procedure {in Algol 60) written by M. Bell {University of 
London Institute of Computer Science) to whom I am grateful for permission 

to reproduce it 

A Fortran IV version can be supplied on request. 
For this procedure the following must be supplied: 

k = number of variables on which the function to be minimized depends 
bp[l :*1 — base point, the initial guesses for the values of each variable (para- 
meter estimate) 
np(l :k] = newpolnt, a real array 

eUp[\:k) = initial step sice for altering each variable in search for better 
values 

redfact[l :k] m step reduction factor for each variable (usually between 0-1 and 
0-5) 

critdep[\ :k] smallest permissible step slse for each variable. This controls 

accuracy with which the minimum is located. 
ep» = half the smallest of the critsteps 
eval = number of evaluation, an integer variable 
evalim — maximum permissible number of evaluations of function 
pat = patternfactor (usually 10, but other values may help In some oases) 
m in = a real variable 

The function to be minimized is declared as 

real procedure function (P); real array (P); (see Table 12.8.3 for an example) 
On exit, after calling patternsearch, 

min = minimum value of the function 

np = values of the variables corresponding to the minimum (the least squaies 
parameter estimates for example) 

eval — number of evaluations of the function during the search 
procedure patterneearch {function, k, bp, np, step, red fact, criUtsp, epe, wool, 
evalim, pat, min); integer k, eval, evalim', real ep», pat, mini real array bp, np, step, 
red fad, critetep ; real procedure function ; 

begin real array move [1 ;k]; integer i, fait*; real value, minstore; 
procedure explore; 

begin real home; integer/; 
/a»Zs:=o; 

for i: - 1 step 1 until k do 
begin home: = np(»]; j:= 1; 

ADD 8: np[i):— home +step[i]; value : = function (np);eval: — eval +1; 

if value <mtn then min : value 

else begin 

if j «= 2 then begin np[i] : = home ; fails: = faue J r 1 * ud 
else begin stepii]:^ -step{i); 2; goto ADD S end 

end 

end of eauplore; 



Copyrighted material 



264 Fitting curves 



§12.8 



min : = function (ftp); aval: = 1 ; 

GO ON: for i: - 1 itep 1 until fc do nrfi]: = 6p{t]; 

TRY: explore; 

if /ail* a fc then 

begin for i: m 1 itep 1 until k do 

if o6*(«kp[i]) > criiMepii] then goto COJVT; 
goto EXIT; 

CONT: for i: = 1 itep 1 until k do atop [i] : = r*i/ac<£t] x efepfi]; 
goto TRY 

end; 

for % : 1 step 1 until k do movefi]: = np[i] -bp{\]; 
PATTERNING: if eval > evalim then goto EXIT: 
for i:= 1 it*p 1 until Ado 

begin bpi%]:= np[i]; np[i): = 6p{» ]+ pai xmow(t]; 
if movefi] x rf*p[i] < 0 then *tep[i); = -«tep[i] 

end; 

minjfore: = mtn; min: = function (np); evaZ: = 1 ; 
explore; 

if mtn < mtnctore then 

begin for t: = 1 step 1 until * do move[i]:^ np[i] — bp[i]; 
for i : = 1 etep 1 until do if o4«(mow(i]) > ep$ then goto T TERNING 

min: = mindore; goto GO O.v ; 
EXIT: end of patterntearch; 

given that can be used to minimize any function, i.e. that will find 
the values of the k variables (in the present example k = 2 variables, 
viz. K and V) required to make the function (in the present example, 
8 given by (12.8.2)) a minimum. The procedure was written by Bell 
on the basis of the work of Hooke and Jeeves (1961). The procedure 
starts from the initial guess (basepoint) by trying steps (of speoified 
size) in each variable to see whether the function is reduced. The 
size of the reduction is not taken into account. When a successful 
pattern of moves has been found it is repeated, the step size increasing 
while the moves are successful (i.e. while they reduce the function 
value). When the function cannot be decreased any further the step 
size is reduced (by a specified factor) and a further exploration carried 
out. When the steps fall below a speoified size the search terminates on 
the assumptions that a minimum has been found. Further details are 
given by Wilde (1964). 

Of course, if the surface has several pits pattemsearch will locate only 
one of them, which one depending on the initial guess, step sizes, 
etc. 

A typical procedure for calculating values of the function is shown in 
Table 12.8.3. It calculates the sum of square deviations (eqn. (12.8.2)) 
for fitting the Michaelis-Menten equation. It incorporates a simple 



Copyrighted material 



§12.8 



The relationship between two variables 265 



devioe for preventing the search venturing into the craggy (and physically 
impossible) region of negative V and K valnes. 

When the pattemsearch program was used for fitting the Miohaelis- 
Menten curve to the results in Table 12.8.1 a minimum of S = 4-32299 
was found at t = 31-45004 and ft = 15-89267 after 215 evaluations of 
S (from Table 12.8.3) with various trial values of V and K. In this case 
the initial guesses, bp in Table 12.8.2, were set to V = 2 0 K = 50 0, 

Table 12.8.3 

An Algol 60 procedure for calculating the function to be minimized for 
fitting the Michaelis-Menten equation. The arrays containing the n 
observations, y [l:n], and the n substrate concentrations, x[l:n], are 
declared and read in before calling pattemsearch. If the Boolean variable 
constrained is set to true the search is restricted to non-negative values of 

VandK 

real procedure function (P); real array (P); 
begin integer j; real S, K, V, Ycalc; 

If constrained then for j : = 1, 2 do if Ptf] < 0 then Pf;]:= 0; 
F:= P[l];*:= P[2]; 
for j: = 1 step 1 until n do 
begin Ycatc:= FXaO'l/(* + **]); 

S:=S+<rtjl-reate)t2 
end; 

function : — S 
end of function; 

step sizes were 1-0 for both V and K, reduction factor was 0-2 for both 
V and K, and critstep was 10~ a for both V and K. Patternfactor was 
2*0. In another run, the same except that patternfactor was set to 
10 virtually the same point was reached (S = 4-32290 at V - 31-45019 
and k = 15-89286) after 228 evaluations of S— not quite as fast. If 
the initial guesses were V = 1-0, K = 2-0 then again the virtually same 
minimum (S = 4-32299 at ? = 31-45018 and R ^ 15-89283) was 
reached after 191 trial evaluations of S. On the other hand if the initial 
guesses are V = 2-5, K = —3-8 and the step sizes 0-01 then the program 
locates the subminimum (S = 764-299 at V = 3-2443 and K = 
-3-793) shown in Fig. 12.8.2(b), if not constrained. 

Other uses for pattemsearch 

The program in Table 12.8.2 can be used for any sort of minimization 
(or maximization) problem. It can, for instance, be used to solve any 



Copyrighted material 



266 Fitting curves 



§ 12.8 



set of simultaneous equations (linear or non-linear). If the n equations 
are denoted = 0 (t = l,...,n) then the values of a: correspond- 

ing to the minimum value of 2/? (which will be zero if the equations 
have an exact solution) are the required solutions. 

The meaning of 'best' estimate 

The method of least squares has been used throughout Chapter 12 
(and implicitly, in earlier chapters). It was stated in § 12.1 that least 
squares (IS) estimates have certain desirable properties (unbiasedness 
and minimum variance; see below) in the case of linear (see § 12.7) 
problems. It cannot automatically be assumed that least squares 
estimates will be the best in the case of non-linear problems (and even if 
they are best, they may not be so much better than others that it is 
worth finding them if doing so is much more troublesome than the 
alternatives). // the distribution of the observations is normal then 
the method of least squares becomes the same as the method of maxi- 
mum likelihood (see Chapter 1) and this method gives estimates that 
have some good properties. Maximum likelihood (ML) estimates, 
however, are often biased, as in the case of the variance for which the 
maximum likelihood estimate is S(ar— x) 2 /N, see § 2.6 and Appendix 
1, eqn. (A 1.3.4). And in general ML estimates have minimum variance 
only when the size of the sample is large (they are said to be asymp- 
totically efficient, meaning that as the size of the sample tends to infinity 
the variance of the ML estimate is at least as small as that of any other 
estimate). Such results for large samples {asymptotic results) are often 
encountered but are not much help in practice because most experi- 
ments are done with more or less small samples. There are few published 
results about the relative merits of different sorts of estimates for 
non-linear models when the estimates are based on small experiments. 
Such knowledge as there is does not contradict the view that if the 
errors are roughly constant (homosoedastic) and roughly normally 
distributed, it is probably safest to prefer the LS estimates in the 
absence of real knowledge. The ideas involved will be illustrated by 
means of the Michaelis-Menten curve fitting problem discussed above. 

As in all estimation problems, there are many ways of estimating the 
parameters (f and X of (12.8.1) in the present example), given some 
experimental results. And as usual all methods will, in general, give 
different estimates. The methods most widely used in the Michaelis- 
Menten oase all depend on transformation of (12.8.1) to a straight line 
(cf. § 12.6). The most widely used (and worst) method is the double 



Copyrighted material 



§ 12.8 The relationship between two variables 267 

reciprocal plot (or Lineweaver-Burk plot). This depends on rearrange- 
ment of (12.8.1) into the form 



1 1 K/l\ 
Y = V+v\x)' 



(12.8.3) 



whioh shows that a plot of 1/y against l/z should be straight with 
intercept 1/F and slope K/V. Suoh a plot is shown in Fig. 12.8.3. 
A straight line has been fitted to the results by the simple (unweighted) 
method of least squares described in § 12.5 (in laboratory practice 




Kitt<-d to point* 



I 1 1 1 i 

0 10 20 30 40 

100 



Flo. 12.8.3. Double reciprocal (or Lineweaver-Burk) plot (1/y against 
1/*) for the 'observations' in Table 12.8.1. See also Table 12.8.4. 

O 'Observations'. 
Straight line fitted (see text) to 'observations'. 

Intercept = 100/F = 100/22-68. 

Slope «= KfV = 8 16/22-68. 
True line corresponding to population mean velocities in 

Table 12.8.1 (i.e. f = 30 X = 16, see Table 12.8.4). 

Intercept = 100/30 = 3-33. 

Slope = r-jJC = 0-6. 



Copyrighted material 



268 Fitting curves § 12.8 

usually either this done, or a line is fitted by eye). From its slope and 
interoept the estimates of V and K are found to be V — 22-58 and 
K = 8- 16. 

Another method is based on the rearrangement of (12.8.1) in the 

a?A m*. ■ lji 

iorm 

Y = y - K (i) ' < 12M) 

from which it is seen that plot of y against yjx should be a straight 
line with slope —K and interoept V. This plot is shown in Fig. 12.8.4. 
Again a straight line was fitted using the method of § 12.5 in spite of 
the fact that the abscissa, yjx, is not free of error as assumed in § 12.1 
(because it now involves the observations, y). From the slope and 
interoept of this line the estimates are found to be V = 25-76 and 
K = 10-13. 

The results of applying these various estimation methods to the 
observations in Table 12.8.1 are compared in Table 12.8.4. They are not 
very informative as they stand, but it will now be shown that they are 
not untypical. 



Table 12.8.4 





V 


K 


True population value 


3000 


1500 


Least squares estimate 


31-45 


16-89 


Lineweaver-Burk 






estimate (eqn. (12.8.3)) 22-58 


8- 18 


V against yjx estimate 






(eqn. (12.6.4)) 


26-76 


1013 



In fact the 'observations' in Table 12.8.1 were taken from a study in 
which simulatedf experiments were used to investigate various 
methods of estimation under various conditions (Colquhoun 1969; of. 
Dowd and Riggs 1965). An 'experiment' was performed by picking at 
random an observation from a normally distributed population known 
to have the mean (ji) given in Table 12.8.1 (and plotted in Fig. 12.8.1), 

t The simulation method avoids the mathematical difficulties of flwHmg the distribu- 
tion of estimates, but the results are not very general. Fig. 12.8.6 would look different 
for different sorts of error, different distributions for the observations and different 
experimental designs (spacing and number of substrate concentrations, i.e. of x values). 



Copyrighted material 



§ 12.8 The relationship between two variables 

and known to have a standard deviation o{y) = 1*0 at every oonoentra- 
tion (i.e. the 'observations' were homosoedastio— see Fig. 12.2.3). The 
'observations' were generated using oomputer methods. The observa- 
tions are thus known to be unbiased (their population means, ft, are 




Pio. 12.8.4. Linearized plot using y against y/ar. 

O 'Observations' from Table 12.8.1. 
Straight line fitted (see text) to 'observations' 

Intercept = V = 25-70, slope = —K = -10-18 

(see Table 12.8.4). 
True line corresponding to population mean velocities In 

Table 12.8.1, i.e. intercept — V = 80, slope = -X 

= -15. 

known to lie exactly on the calculated curve in Fig. 12.8.1) and, unlike 
what happens in any real experiment, their distribution and population 
means and standard deviations are known. Seven hundred and fifty 



Copyrighted material 



270 



Fitting curves 



§12.8 



such 'experiments' were performed, and from eaoh 'experiment' 
estimates of V and K were calculated by five methods (three of which 
have been mentioned above). The resulting 750 estimates of V and K 
were grouped to form histograms. The distributions so obtained of the 
estimates of V are shown in Fig. 12.8.5 for three methods of estimation. 

True value 1 =30 



300 



ISO- 



0^ 



y against y/z 



o 
e 
n 

3 

c 
t 



ISO- 




„■ n 



90 >100 



300 



1 50 



oLj- 



Leasl squares 



Q 



15 45 60 

Estimate of V 
True value 1^=30 



75 



90 



FlO. 12.8.6. Distributions of the 750 estimates of V (— 30) obtained, using 
three methods, in 750 simulated experiments. Top : estimates from plot of y against 
yjx (as shown for one 'experiment' in Fig. 12.8.4). Middle: double reciprocal 
(Lineweaver Burk) plot (as shown for one 'experiment' in Fig. 12.8.S). Bottom: 



Copyrighted material 



§12.8 



The relationship between two variables 271 



The distributions of estimates of K are similar, which is expected in 
the light of the finding that the estimates of V and K are highly corre- 
lated, i.e. experiments that yield an estimate of V that is too high tend 
to give an estimate of K that is too high also, whichever method 
of estimation is used. Inspection of Fig. 12.8.5 shows that in this 
particular case (the pi values shown in Table 12.8.1 and Fig. 12.8.1, with 
normally distributed homosoedastic observations) the method of least 
squares is in fact the best of the three methods. The LS estimates are 
more closely grouped round the population value (V = 30-0) than the 
estimates found by the other methods (i.e. they have the smallest 
variance), and the average value of the LS estimates (viz. 30.4) is close 
to the population value (i.e. they have little bias). 

By comparison the Lineweaver-Burk method is clearly terrible — 
the scatter of estimates being very much greater (near infinite estimates 
will be obtained when the plot in Fig. 12.8.3 goes nearly through the 
origin giving \\V czl 0, and these distort the average estimate so much 
that no realistic estimate of the bias is possible). 

The plot of y against y(z falls in between these extremes. In spite of 
breaking the rules for fitting straight lines by having error in the 
quantity (y/z) plotted along the abscissa, the estimates are obviously 
much less variable than those found by the Lineweaver-Burk method 
(their standard deviation is only about 28 per cent greater than that of 
the LS estimates in this case). The estimates from the y vs.yjx plot 
are, however clearly consistently too low — they have a negative bias. 
The average of all 750 estimates is 28-0, well below the population 
value of 30-0, and about 73 per cent of estimates are too low (i.e. below 
30-0). This bias is purely a property of the method of estimation. In these 
simulated experiments the observations themselves were known to be 
completely unbiased (a similar situation was seen in the case of the 
standard deviation, see § 2.6 and Appendix 1). In real life there would 
in addition be some unknown amount of bias in the observations themselves 
(see §§ 1.2 and 7.2). 

If, as is usually the case, experiments are repeated several times, 
bias would be considered a more serious problem than large variance. 
This is because the variance of an estimate can always be reduced by 
doing a large enough number of experiments, whereas bias remains 
however many experiments are averaged, and there is no way of 
detecting the presence of bias from the results of repeated experiments. 
These results are only valid for the particular conditions under which 
they were obtained. In fact different results are obtained if the errors 



272 Fitting curves 



§ 12.8 



are not constant or the observations not normally distributed (Dowd 
and Riggs 1965). For example, if the observations are normally dis- 
tributed but heteroscedastio, i.e. they do not have the same standard 
deviation at each z value, then it is found, in the case when the co- 
efficient of variation (standard deviation/mean) is the same at each 
x value, that linear transformations give better estimates of V and K than 
the least squares method (Colquhoun 1069). The only exception is the 
linear Lineweaver-Burk plot which is always awful. 

Why are the Lineweaver-Burk estimates so bad ? 

The problem is mainly one of weighting. In fitting the straight line 
to the plot of \jy against 1/z, the dependent variable, 1/y, has been 
treated as though it had constant variance (see §§ 12.1 and 12.2), 
and if the straight line is fitted by eye rather than by the method of 
§ 12.5, the result is usually much the same. In fact, in this example 
y had constant variance (= 1-0 at every x value). The variance of 1/y 
is therefore, from (2.7.14), approximately proportional to l//i* — very 
far from constant. Inspection of Fig. 12.8.3 shows that in the particular 
experiment illustrated the poor estimates were mainly the result of 
the error in the top point of graph (1/z — 0-4, z = 2*5). This observa- 
tion was somewhat too high (see Table 12.8.1), so 1/y is too low, and 
this point has been given far too much weight in plotting the straight 
line in Fig. 12.8.3. It has pulled the line down distorting the parameter 
estimates. From (2.7.14), and the values of n in Table 12.8.1, it is seen 
that the variance of l/y at x = 2-5 is approximately t»<n(y)//i* = 1*0/ 
(4*2857)*, and the variance of l/y at the highest substrate concentration 
(x = 40) is approximately 1-0/(21 -81 82)* — far more precise. Each 
point should really have a weight inversely proportional to its variance 
(see § 2.5) so the point for x = 40 (1/z = 0 025) should have (21-8182)*/ 
(4-2857)* C± 670 times the weight of the point for x = 2-5 (1/z = 0-4), 
not the equal weight it was given in Fig. 12.8.3. The impression that the 
point for z = 2-5 has been given far too much importance in the 
Lineweaver-Burk plot is confirmed. The correctly weighted Lineweaver- 
Burk plot is quite satisfactory, but in real life the weights (population 
variances) would not be known so fitting it would be no less arithmetic- 
ally inconvenient than finding the LS estimates. 

12.9. Correlation and tha problem of causality 

So far in this chapter it has been assumed that the z variable (or 
variables) can be fixed precisely by the experimenter. In many cases, 



Copyrighted material 



§ 12.9 



Correlation 273 



especially in social and behavioral sciences, when often it is not 
possible, or thought not to be possible, to do proper experiments (see 
Chapter 1), two (or more) variables are measured, neither (or none) 
of which can be fixed by the experimenter, or assigned by him to 
particular individuals. Results of this sort are far more difficult to 
interpret, and therefore far less satisfactory, than the results of proper 
experiments as discussed in Chapter 1, but they are sometimes un- 
avoidable. 

Examples of the sort of questions usually treated by correlation 
methods are (a) do people with good scores in school exams also have 




j — i — i — * — i — i i — i — i — i — i — i — i i — i — i i i ) i i i i i i i i 




r, = O0O 
r =079 



(0 



0 



© 



© 



© 



© 

■ ■ ■ ' ■ 



© 

0-60 
11*91 




(h) 



© 



© 



© 



r„=+0O9 



-001 



FiO. 12.9.1. Behaviour of the Spearman rank correlation coefficient r m , 
and the product moment correlation coefficient, r, on various sorts of data. 
Clearly non-linearity can result in coefficients of almost any value even when there 
is a perfectly smooth relationship between x and y. In these small samples it can 
be seen from Table 12.9.2 that there is no evidence against the null hypothesis 
that the population value of r a is xero in figures (d)-(h). 



high scores in university exams? (b) are people who smoke a lot of 
cigarettes more likely to die of lung cancer than those who smoke few ? 
(c) do parts of the country that have a large number of doctors per 
1000 of population have more or fewer working days lost because of 
illness than less well supplied areas? and so on. In each of these cases 
there are two sets of figures (e.g. school and university exam scores for a 
number of people) which can be plotted on a graph or scatter diagram 
like those in Fig. 12.9.1. The tendency of one variable to increase (or 
decrease) as the other variable increases can be measured by a correlation 



Copyrighted material 



274 Correlation 



§ 12.9 



coefficient. There are many different sorts of correlation coefficient, of 
which two will be described briefly. For detailed descriptions of correla- 
tion methods see, for example, Guilford (1954). 

If a correlation is observed between two variables (A and B say), 
and if it is large enough for it to be unlikely that it arose by chance, 
then it can be concluded that 

either (1) A causes B, 
or (2) B causes A, 

or (3) some other factor, directly or indirectly, causes both A and B, 
or (4) an unlikely event has happened and a large correlation has 
arisen by chance from an uncorrelated population (see § 6.1). 

Usually there is no reason, other than the observer's prejudioe, for 
preferring one of these explanations to the others. As explained in 
Chapter 1, the only way to choose between (1), (2), and (3) is to do a 
proper experiment. For example, using the example already discussed 
in § 12.7, if it were found that areas with more doctors (x) had fewer 
working days (y) lost through illness, the relationship might be pre- 
sented in the form of a correlation coefficient, which would be negative, 
between x and y, or by fitting a curve to the graph of y against x. If a 
straight line, Y = a+bx, was an adequate representation of the 
observations the slope, b, would be negative. However, as mentioned in 
§ 12.2, the least squares estimate (b) of the slope found by minimizing 
X(t/ - - Y) 2 will not be quite the same as the estimate found by using the 
horizontal deviations from the line in Fig. 12.2.1 (i.e. treating x as the 
dependent variable and minimising Y,{x—X) 2 ). Since there is no in- 
dependent variable in this case it is not obvious which line to fit. This 
problem is avoided with correlation coefficients, into which x and y 
enter in a symmetrical way. The interpretation of the relationship, 
however it is presented, is clearly very difficult because chosen numbers 
of doctors were not allocated at random to selected areas. This has 
already been discussed at length in § 12.7. As stated there, and in 
Chapter 1, the only way out of the difficulty is to do a proper experi- 
ment. 

Correlation based on ranks. Spearman's coefficient, r B 

This coefficient, like other methods based on ranks, does not depend 
on assumptions about normal distributions or the straightness of lines. 
And, like other correlation coefficients, a value of -f 1 corresponds to 
perfect correlation between x and y, a value of 0 corresponds to no 



Copyrighted material 



§12.9 



Correlation 275 



correlation and a value of — 1 corresponds to perfect negative correla- 
tion (y decreasing as x increases). However, what is meant by 'perfect 
correlation' is not the same for different coefficients (see Fig. 12.9.1). 
In the case of the Spearman coefficient it means that the ranking of 
individuals is the same for both criteria. As an example take the N = 6 
pairs of observations shown in Table 12.5.1. These were analysed by 
regression methods in § 12.5. They are reproduced in Table 12.9.1, 
in which the ranks of the x and of the y values are given, and also 
d t = difference between ranks for the tth pair of observations. In this 
case one variable might be a measure of the rarity of doctors in the tth 
area, and the other variable a measure of the number of working dayB 
lost through illness in that area. 



Table 12.9.1 



pair 

no. (t) 


*l 


Vt 


rank 

ofm t 


rank 

of ft 




<*? 


1 


160 


69 




2 


-1 


1 


2 


166 


64 




1 


+ 1 


1 


8 


169 


64 




3 


0 


0 


4 


176 


67 




4 


0 


0 


6 


180 


86 




6 


-1 


1 


6 


188 


78 




6 


+ 1 


1 


Total 


1087 


407 


21 


21 


0 


4 



The Spearman rank correlation coefficient, r 8 , is estimated using the 
same formula (eqn. (12.9.3)) as used for the Pearson coefficient (see 
below), but using the ranks rather than x and y themselves. It can be 
shown (e.g. Siegel 1956) that the same anwer is found more easily from 

where Id? is the sum of the squares of the differences in rank for each 
pair of observations (as shown in Table 12.5.1) and N = number of 
pairs. From Table 12.5.1, = 6 and Ed 3 = 4 so 

6X4 

r B = 1— — - = 0-886. 
8 6(36-1) 

This is a less than perfect positive correlation, as expected. If the ranks 
for y had been exactly the same as those for x, all the differences, 



Copyrighted material 



276 Correlation 



§ 12.9 



d t , would have been zero, so it is obvious from (12.9.1) that r 8 would 
have been -f 1. If the ranks for y had been in exactly the opposite order 
to the ranks for y then r 8 would have been — 1. And that is about all 
that can be said. In no sense does a correlation coefficient (of any sort) 
of 0-886 mean '88*6 per cent perfect correlation', and clearly r B does 
not measure the slope of the line when the observations (or the ranks) 
are plotted against each other aa shown in Fig. 12.5.1, as r 8 can only 
vary between +1 and —1. Some examples of the Spearman and 
Pearson (see below) correlation coefficients calculated from particular 
sets of observations are shown in Fig. 12.9.1. to give an idea of their 
properties. It is obvious from this figure that far more information is to 
be gained from plotting the graph than from calculating a correlation 
coefficient. 

Ties. Small numbers of ties can be given average ranks as in Chapters 
8-10. For a description of the corrections necessary when there are 
many ties see, for example, Siegel (1956). 

1 8 it unreasonable to suppose that the observed correlation arose by chance ? 

As usual this, put more precisely, means 'what is the probability that 
a correlation coefficient differing from zero by as much, or more than 
the observed value would be found by random sampling from an 
un correlated population?' (see §6.1). The exact probability can be 
found in just the same sort of way as was used in Chapters 8-10. If 
the observations are from an un correlated population each of the N ! 
possible rankings of y (permutations of the numbers I to N) would 
have an equal chance of being observed in combination with a given 
ranking of z. The probability of any particular ranking would therefore 
be l/N ! so a correlation of + 1 or — 1 (when no more extreme values are 
possible) will have P — IjN ! (one tail) or 2jN ! (two tail, see Chapter 
6). P can always be found by enumerating all N\ possibilities and 
seeing how many give r 8 equal to or larger than the observed value 
(of. Chapters 8, 9-10). To save trouble, tables have been constructed 
giving the critical values of r 8 corresponding to P (two tail) not more 
than 0*1, 0-05, and 0-01. For samples up to N = 8 the values are shown 
in Table 12.9.2. 

In the present example N = 6 and r 8 = 0*886 so P = 0*05 (from 
Table 12.9.2). For larger samples than 8 it is olose enough to calculate 




(12.9.2) 



Copyrighted material 



§ 12.9 Correlation 277 

and refer the value of t found to tables (described in § 4.4) of Student's 
* distribution with N— 2 degrees of freedom. Equivalently, when A r > 8, 
r 8 can be referred to tables (e.g. Fisher and Yates, 1963, Table VII) of 

Table 12.9.2 

Critical values of r & . If the observed r B (taken as positive) is equal to or 
larger than the tabulated value then P(two tail) is not more than the specified 
value. Reproduced from Mainland (1963), by permission of author and 

publisher. 



Number of 




P (two tail) 






01 


006 


001 




1000 








0-900 


1000 






0-820 


0-886 


1000 




0-714 


0-786 


0-920 




0-643 


0-738 


0-881 



critical values of Pearson's correlation coefficient. In this case t — 0-886 
V[(6-2)/(l-0-886 a )] = 3-82 with 6-2 = 4 degrees of freedom. 
Reference to tables of t (see § 4.4) gives P 0*02, not a very good 
approximation to the exact value (0-05) when N is as small as 6. 

Linear correlation. Pearson's product moment correlation coefficient (r) 

Jfx and y are both normally distributed t (see Chapter 4) the oloseness 
with whioh points cluster round a straight line is measured by Pearson's 
product moment correlation coefficient, r. This measure has been met 
already in § 10.7. The population value of r is estimated by 

S(y-ff)(*-g) 
f VP(y-y) 3 .Z(*-*) a ] 

y/[v&r{y).v&r(z)] 1 1 

The second form follows from the definition of variance and oo variance 
((2.6.2) and (2.6.6)). It was shown in § 2.6 that the covariance measures 
the extent to which y increases as x increases. Pearson's r will be 1 

* It m actually aatumed that x and y follows a bivariate norma) distribution (see, for 
example, Mood and Graybill (1963), p. 198). 



Copyrighted material 



278 Correlation 



§ 12.9 



(or —1) only if the points lie exactly on a straight line as shown in 
Fig. 12.9.1. The relationship between x and y may be perfectly pre- 
dictable and yet have a low correlation coefficient if the relation is not 
a straight line, as illustrated in Fig. 12.9.1 (c), (d), and (g). The informa- 
tion to be gained from r is therefore limited. 

Using the results in Table 12.5.1 and Table 12.9.1 as an example 
once again, r can be estimated easily because the sums of squares and 
products have already been calculated in § 12.6. Inserting their 
values in (12.9.3) gives 



a fairly large positive correlation. Its interpretation has been discussed 
above. 

To find what the probability of observing a Pearson correlation 
coefficient as large or larger than 0-852 would be, if the observations 
were randomly selected from normal population with zero correlation, 
the procedure is to calculate t using (12.9.2). The value of t is referred 
to the tables of Student's t distribution (described in § 4.4) with N—2 
degrees of freedom where N is the number of pairs of observations. 
In the present example N = 6 so 



Consulting the tables with 6—2 = 4 degrees of freedom shows that 
the required probability is between P = 0*05 (corresponding to 
t = 2-776) and P = 0 02 (corresponding to t = 3-747). This is low enough 
to make one a little suspicious of the null hypothesis that the population 
correlation is zero. (The inference was, for practical purposes, the 
same when Spearman's coefficient was calculated using ranks.) The 
same result can be obtained, without calculation, from tables of critical 
values of r (e.g. Fisher and Yates (1963, Table VII)). 

A little bit of algebra shows that the test of the hypothesis that the 
population correlation coefficient is zero is identical with test (in § 12.5) 
that the population slope (regression coefficient) is zero. The value of 
t just found is the same as that found at the end of § 1 2.5, and t 2 = 3-27 a 
= 10-7 is the value of F found in the analysis of variance of the observa- 
tions shown in Table 12.5 2. 



611-833 



V(526-833x 682-833) 



= 0-853 




Copyrighted material 



13. Assays and calibration curves 



'II est vrai que certainee paroles et certain es ceremonies sufflsent pour faire 
perir un tropeau de moutons, pourvu qu'on y ajoute de l'arsenic.'t 

Voltairb 1771 

(Questions sur V Encyclopedic 'Enchantement') 

f 'Incantations will destroy a flock of sheep if administered with a certain 
quantity of arsenic' 

(Translation: Okobok Eliot, Middleman*, Chap. 17) 



13.1 . Methods for estimating an unknown concentration or 
potency 

The process of estimating an unknown concentration will be referred 
to as an assay. All biological assays and most chemical assays depend 
on comparison of the unknown substance with a standard so the 
principles involved in both chemical and biological assays are the same. 
The objeots are to obtain (a) the 'best' (usually least squares, see 
§§ 12.1, 12.2, 12.7, and 12.8) estimate of the unknown concentration, 
(b) confidence limits for its true value, and (c) to test as many as 
possible of the assumptions involved in the assay. Unfortunately 
almost all the methods used involve the assumption of a Gaussian 
(normal) distribution (see § 6.2). As usual it is no exaggeration to say 
that there is rarely any reason to believe that this assumption is correct 
so the results must be interpreted with caution as indicated in § § 4.2, 4.6 
and 7.2. A detailed account of biological assay will be found in Finney 
(1964), whose notation has been used in most places to make this 
standard reference book as accessible as possible. 

This chapter is pretty solid and it may help to go through the 
numerical examples in §§ 13.11-13.15 before looking at the theory in 
§§ 13.2-13.10. The objeot of the theoretical part is to derive the 
formulas used in parallel line assays using simple algebra only. This 
means putting all the steps in, avoiding 'evidently* and 'it is obvious 
that*. One result is that the theoretical part is rather long and, by 
mathematicians' standards, inelegant. Another result, I hope, is that 
the basis of the analysis of parallel line assays is made available to 



Copyrighted material 



280 Assays and calibration curves 



§ 13.1 



those who, like me, prefer to have the argument laid out in words of one 
syllable. 

The experimental designs according to which the various concentra- 
tions of standard and unknown substance can be tested are discussed 
at the end of this section. 

All the methods to be disoussed involve the assumption, which may 
be tested, that the relationship between the measurement {y, e.g. 
response) and the concentration (z) is a straight line. Some transforma- 
tion of either the dependent variable, y, or the independent variable, x, 
may be used to make the line straight. The effects of such transforma- 
tions are discussed in § 1 2.2. In biological assay the transformed response 
is called the response metameter (i.e. the measure of response used for 
calculations) and the transformed concentration or dose is called the 
dose metameter. Of course the response metameter may be the response 
itself, when, as is often the case, no transformation is used. 

Furthermore, all the methods to be discussed assume that the 
standard and unknown behave as though they were identical, apart 
from the concentration of the substance being assayed. Suoh assays are 
called analytical dilution assays. When this condition is not fulfilled 
the assay is called a comparative assay. Comparative assays occur 
when, for example, the concentration of one protein is estimated 
using a different protein as the standard, or when the potenoy of a 
new drug relative to a different standard drug is wanted. (Relative 
potency means the ratio of the concentrations or doses required to 
produce the same response.) One difficulty with comparative assays is 
that the estimate of relative concentration or potency may not be a 
constant, i.e. independent of the response level chosen for the comparison, 
so when a log dose scale is used the lines will not be parallel (see below). 

Calibration curves 

Chemical assays are often done by constructing a calibration curve, 
plotting response metameter (e.g. optical density) against concentration 
of standard. The concentration corresponding to the optical density 
(or whatever) of the unknown solution is then read off from the calibra- 
tion curve. This sort of assay is disoussed in § 13.14. 

Continuous {or graded) and discontinuous {or quantal) responses 

In chemical assays the 'response' is nearly always a continuous 
variable (see §§3.1 and 4.1), for example volume of sodium hydroxide 
or optical density. In biological assays this is often the case too. 



Copyrighted material 



§13.1 Assays and calibration curves 28 1 

For example the tension developed by a muscle, or the fall in blood 
pressure, is measured in response to various concentrations of the 
standard and unknown preparations. Assays based on continuous 
responses are discussed in this chapter. Sometimes, however, the 
proportion of individuals, out of a group of n individuals, that produced 
a fixed response is measured. For example 10 animals might be given a 
dose of drug and the number dying within 2 hours counted. This 
response is a discontinuous variable — it can only take the values 
0, 1, 2 10. The method of dealing with such responses is con- 
sidered in Chapter 14, together with closely related direct assay in 
which the dose required to produce a fixed response is measured. 

One of the assumptions involved in fitting a straight line by the 
methods of Chapter 12, discussed in § 12.2, is the assumption that the 
response metameter has the same scatter at each x value, i.e. is homo- 
soedastio (see Fig. 12.2.3). This is usually assumed to be fulfilled for 
assays based on continuous responses (it should be tested as described 
in § 11.2). In the case of discontinuous (quantal) responses there is 
reason (see Chapter 14) to believe that the homosoedasticity assumption 
will not be fulfilled, and this makes the calculations more complicated. 

Parallel line and slope ratio assays 

In the case of the calibration curve described in § 13.14 the abscissa is 
measured in concentration (e.g. mg/ml or molarity). It is usual in 
biological assays to express the abscissa in terms of ml of solution 
(or mg of solid) administered. In this way the unknown and standard 
can both be expressed in the same units. The aim is to find the ratio 
of the concentrations of the unknown and standard, i.e. the potency 
ratio R. 

concentration of unknown 
" concentration of standard 

amount of standard for given effect (z B ) 

amount of unknown for same effect (z v ) (13.1.1) 

For example, if the unknown is twice as concentrated as the standard 
only half as much, measured in ml or mg, will be needed to produce 
the same effect, i.e. to contain the same amount of active material. 
See also §13.11. 

Suppose it is found that the response metameter y t when plotted 
against the amount or dose, in ml or mg, gives a straight line. Obviously 



Copyrighted material 



282 Assays and calibration curves 



§ 13.1 



the response should be the same (zero, or control level) when the 
dose of either the standard or unknown preparation is zero. The 
straight line for standard can be written Y B = a-\-b B z Bf where 6 8 is 
the slope, z 8 the dose (amount) of standard, and a the response to 
zero dose (z 8 = 0) ; similarly for the unknown Y v — a-f- the 
response to zero dose being a, as for the standard. When Y B = Y v 
it follows that a+6 8 z 8 = a | b u z v so the potenoy ratio, from (13.1.1), 
is R = zjzjj — b v jb B , the ratio of the slopes of the lines, as illustrated 
in Fig. 13.1.1(a). An assay in which the abscissa is the dose or amount 
of substance is therefore called a slope ratio assay (c.f. § 13.14). This 
sort of assay is described in detail by Finney (1964). 



s 




Flo. 13.1. (a) Slope ratio assay. Response met&meter plotted against dose, 
(b) Parallel line assay. Response met&meter plotted against log dose. See text for 

discussion. 

Consider now what happens if it is found empirically that, in order 
to obtain straight lines, the response metameter must be plotted against 
the logarithm of the dose, x = log z say. The ratio of doses required to 
produce any arbitrary constant effect Y, in Fig. 13.1.1(b) is again the 
potenoy ratio z 8 /zu from (13.1.1). Now from Fig. 13.1.1(b) the horizontal 
distance between the two lines is x B ~ x v = logz 8 — logz^ = log(z 8 /z D ) 
= log R. So the horizontal distance is the log of the potency 
ratio, and because (for analytical dilution assays, see above) the potenoy 
ratio {R) is a constant, the horizontal distance between the lines (log R) 



Copyrighted material 



§ 13.1 



Assay 8 and calibration curves 283 



must also be a constant. This will be so whether or not the lines are 
straight (the argument has not involved the assumption that they are), 
but when they are straight it implies that they will be parallel. Assays 
in which the absoissa is on a logarithmio scale are therefore called 
parallel line assays. The reason for using a logarithmio dose scale is to 
produce a straight line. Parallelism is a consequence of using the log- 
arithmio scale (see § 12.2 also). Another consequence of using the 
logarithmio dose scale is that the ratio between doses is usually kept 
constant so that the interval between the log doses will be constant. 
The spacing of the doses is, of course, a consequence of using a log- 
arithmio scale, and not a reason for using it as is sometimes implied. 
Furthermore, the range covered by the doses has nothing to do with 
scale chosen. A wide range can be accommodated just as easily on an 
arithmetic scale as on a logarithmio scale. 

A similar situation arises in pharmacological studies when the log 
dose-response curve is plotted in the presence and absence of a drug 
antagonist. The parallelism of the lines can be tested as described in 
the following sections. If they are parallel the potency ratio can be 
estimated. In this context the potency ratio is the ratio of the doses of 
drug required to produce the same response in the presence and absence 
of antagonist, and is called the dose ratio. 

The rest of this chapter, except for § 13.14, will deal with parallel 
line assays with continuous responses. Sections 13.2-13.10 deal with 
the theory and numerical examples are worked out in §§13.11-13.15. 

Types of parallel line assays 

In biological assays, when the response, y, is plotted against log 
dose, *, the line is usually found to be sigmoid rather; straight. But it is 
often sufficiently nearly straight over a central portion for the assump- 
tion to produce negligible error. 

It is convenient to classify assays according to the number of dose 
levels of eaoh preparation used. If k B dose levels of the standard 
preparation are used, and k v of unknown, the assay is described as a 
(*a+*u) doBe assay. The properties of various types are, briefly, as 
shown in Table 13.1.1. 

The tests of validity possible in a (2 + 2) dose assay will now be con- 
sidered in slightly more detail before starting on the theory of parallel 
line assays. It is intuitively plausible that the following tests can be 
done (see § 13.7 for details). 

(1) For slope (i.e. due to linear regression, see § 12.3). The null 



Copyrighted material 



284 Assays and calibration curves 



§ 13.1 



hypothesis that the slope of the response-log dose curve is zero is 
tested. Obviously the assay is invalid unless it can be rejected. Possible 
reasons for an increase in dose not causing an increase in response are 
(a) insensitive test object, (b) doses not sufficiently widely spaced, or 
(o) responses all supramaximal. 

(2) For difference between standard and unknown preparations, i.e. 
is the average response to the standard different from that to the 



Table 13.1.1 



Number of doses of 
8td (fc») Unknown 



1 1 The responses to the two doses must be exactly 

matched. If not too much exactness is demanded 
this may be possible to achieve once, but a single 
match would allow no estimate of error. If the 
doses were given several times it is most improb- 
able that the means would match and so no 
result could be obtained. This matching assay is 
therefore unsatisfactory. 



The response-log dose line for the standard can 
be drawn with two points, if it is already known 
to be straight, and the dose of standard needed to 
produce the same response as the unknown can 
be interpolated. Error can be estimated. The 
assumption that the slope of the line is not cero 
can be tested but the assumptions of linearity and 
parallelism cannot. (See § 13.16.) 



2 The (2 + 2 ) dose assay is better because, in addition 
to being able to test the slope of the dose response 
lines, their parallelism can be tested (see Fig. 
12.1.2). It is still necessary to assume that they are 
straight. 

3 With a (3+3) dose assay the assumptions of 
non-re ro slope, parallelism, and linearity can all 
be tested. 



test preparation ? This is not usually of great interest in itself though 
it helps preoision if g e and $ v are not too different (see § 13.6). It will 
be seen later that this test emerges as a side effect of doing tests (1) and 
(3). 

(3) Deviations from parallelism. The null hypothesis that the true 
(population) slopes for standard and test preparation are equal is 



Copyrighted material 



§13.1 Assays and calibration curves 

tested. If this hypothesis is rejected the assay must be considered in- 
valid. In an analytical dilution assay the most probable cause of non- 
parallelism is that one of the preparations is off the linear part of the 
log dose-response curve. This is shown in Fig. 13.1.2. 




! 

% 

i 

I 
i 

X 




log done (x) 



log dose (x) 



Fia. 13.1.2. Apparent deviations from parallelism can result when some 
doses are not on the straight part of the dose response curve, as shown in (b), 
even when the horizontal distance between the two curves is constant. 

O Observations. 

Straight line fitted to observations. 

True response-log dose curve. 



Symmetrical parallel line assays 

In the following section it will become obvious that the calculations 
can be very greatly simplified when the assay is symmetrical. In the 
context symmetry means that the assay has (a) the same number of 
dose levels of eaoh preparation — either (2+2) or (3+3) usually, (b) eaoh 
dose is administered the same number of times, (o) the ratios between 
all doses are equal, and the same for both standard and unknown, 
i.e. the intervals between doses are equal on the logarithmic scale. 
These conditions are summarized preoisely in eqns. (13.8.1). 

Designs for the administration of standard and unknown 

Any of the usual experimental designs, some of whioh were described 
in Chapter 11, may be used. The various concentrations of standard and 
unknown are the treatments. See also § 13.8. 



Copyrighted material 



286 Assays and calibration curves 



§ 13.1 



For example in a (34-3) dose assay there are 6 different solutions, 
each of which is to be tested several (say n) times. The 6n tests may be 
done in a completely random fashion as described in § 1 1.4. If each dose 
is tested on a separate animal this means allocating the 6n doses to 
6n animals strictly at random (see §§ 2.3 and 11.4). Often all observa- 
tions are made on the same individual (e.g. the same spectrophotometer 
or the same animal). In this case the order in which the 6n tests are 
done must be strictly random (see § 2.3), and, in addition, the size of a 
response must not be influenced by the size of previous responses (see 
discussion of single subject assays below). 

If, for example, all 6n responses could not be obtained on the same 
animal, it might be possible to obtain 6 responses from each of n 
animals, the animals being blocks as described in § 11.6. Examples of 
assays based on randomized block designs (see § 11.6) are given in 
§§ 13.11 and 13.12. A second source of error could be eliminated by 
using a 6 x 6 Latin square design (this would force one to use n = 6 
replicate observations on each of the 6 treatments). However it is 
safer to avoid small Latin squares (see § 11.8). 

If the natural blocks were not large enough to accommodate all the 
treatments (for example, if the animals survived long enough to receive 
only 2 of the 6 treatments), the balanced incomplete block design could 
be used. References to examples are given in § 11.8 (p. 207). 

The analysis of assays based on all of these designs is done using 
Gaussian methods. Many untested assumptions are made in the analysis 
and the results must therefore be treated with caution, as described in 
§§ 4.2, 4.6, 7.2, 11.2, and 12.2. In particular, the estimate of the error 
of the result is likely to be too small (see § 7.2). 

Single subject assays 

Assays in which all the doses are given, in random order, to a single 
animal or preparation (e.g. in the example in § 13.11, a single rat 
diaphragm) are particularly open to the danger that the size of a 
response will be affected by the size of the preceding responses^). 
Contrary to what is sometimes said, the fact that responses are evoked 
in random order does not eliminate the requirement that they be 
independent of eaoh other. Special designs have been worked out to 
make the allowance for the effect of one response on the next, but it is 
necessary to assume an arbitrary mathematical model for the inter- 
action so it is muoh better to arrange the assay so as to prevent the 
effect of one dose on the next (see, for example, Colquhoun and Tattersall 



Copyrighted material 



{18.1 



Assays and calibration curves 287 



(1969). If the doses have to be well separated in time to prevent inter- 
action it may not be possible to give all the treatments to one subject, 
so an incomplete block design may have to be used (see § 1 1.8 and* for 
example, Colquhoun (1963)). The problem is discussed by Finney 
(1964, p. 291). 

13.2. The theory of parallel line assays. The response and dose 
m eta meters 

Response metameier (y) 

The object is to transform the response so that it becomes normally 
distributed, homoscedastio, and produoes a straight line when plotted 
against log dose (see §§ 11.2, 12.2, p. 221, and 13.1). In many oases the 
response itself is used. A linear transformation of the response, of the 
form y = c^+cjy where c l and c a are constants, may be used to simplify 
the arithmetic. This will not affect the distribution, soedasticity, or 
linearity. For example, in § 11.4 each observation was reduced by 100 
to make the numbers smaller. For tests of normality, see § 4.6. 

The dose metameier (x) 

For parallel line assays this of course, by definition (see § 13.1), 
the logarithm of the dose. The dose (measured in volume or weight) is 
denoted z, as in § 13.1. Thus 

x = logz (13.2.1) 

Usually logarithms to base 10 (common logs) will be used because the 
tables are the most convenient; but it will be shown that for parallel 
line assays which are symmetrical, as denned in § 13.1 andeqn. (13.8.1), 
it will make the calculations much simpler to use a different base for 
the logarithms. This will not, of course, affect the linearity or parallelism 
of the lines. At this stage this only looks like an additional complication, 
but the simplification will become apparent later. Numerical examples 
are worked out in §§ 13.11, 13.12, 13.13, and 13.15. 

The symmetrical (2+2) dose assay 

Suppose that the ratio between the high and low doses is D, both for 
the standard and for the unknown. Suppose further that each dose is 
given n times so the total number of observations is A T = An. Of these 
n 8 = 2n = \N are standards and n^ = \N are unknowns. 



Copyrighted material 



288 Assays and calibration curves 



§ 13.2 



If the low doses of standard and unknown preparations are and 
z L0 then, by definition of D, the high doses will be 

Zhb = £>Zui> and z HU = Dz hV . (13.2.2) 

The most convenient base for the logarithms is \ D. This looks most 
improbable at first sight, but the reason why it is so will now be shown. 
Taking the logarithms to the base \ D of the doses (remembering that 
l°g sd^ — 2 whatever the value of D, because the log is defined as the 
power to which the base must be raised to give the argument) gives, 
from (13.2.1) and (13.2.2), 

x L8 = log^ D 2 lj8 (13.2.3) 

*H8 = log = lOg vd(^Ls) 

^lOg^D+log^u, 

= 2+*^. (13.2.4) 

Similarly, for the unknown, x LU = log^Zu,, x HU — 2+x LU . 

The mean value of the log dose for the standard preparation, if the 
high and low doses are given an equal number of times (n), will be, 
using (13.2.4) 

*" ^r~ _ 2 } (13 . 2 . 5) 

and Xu = l+x LU . 

Combining these results with (13.2.4) gives 

(*H8-*s) = +1. 

(*LS-*s) = "I, 

and similarly }(13.2.6) 

(*hu-*u) = +1. 

(*LU — *u) — — 1- 

Using logs to the base y/D has made (x— x) takes the value +1 for 
the high doses (of both standard and unknown), and —1 for the low 
doses. This means that (x— x) a = 1 for every dose; and since there are 
±N doses of standard and £iV doses of unknown, it follows from (2.1.7) 
that 

Z(* 8 -f 8 ) a - Z(*u-*u) a = (13.2.7) 

where the summations are over all \N doses of standard (or unknown). 
Thus the total sum of squares for x, pooling standard and unknown, is 

2 I(*-Z) 7 = Zfo-Xa^+S^-Xu) 3 = N, (13.2.8) 
s.u 



Copyrighted material 



§13.2 Assays arid calibration curves 289 

where the symbol J means 'add the value of the following for the 

s.u 

standard to its value for the unknown' (as shown in the central expres- 
sion of (13.2.8)). The sums of squares are greatly simplified by using 
logs to the base y/D. 

The symmetrical (3-f 3) dose assay 

The most convenient base for the logarithms in this case is D rather 
than y/D. The low, middle, and high doses will be indicated by the 
subscripts SI, S2, and S3 for standard, and Ul, U2, and U3 for un- 
known. The ratio between each dose and the one below it is D, as 
before. Thus 

2 83 = D*B2 - DHsi (13.2.9) 

Taking logarithms to the base D (remembering log D Z> — 1 and 
logo^ 2 = 2 » whatever the value of D) gives 

*si — log^s!, \ 

*sa = logons = log D {Dz B1 ) = logoD+logoZg! = l+x S i> (13.2.10) 

*s 3 = lo gD 2 83 = lognDHlogDZsi = 2+Z Bl . J 

The mean standard dose, if each dose level is given the same number 
of times (n), will be, using (13.2.10), 



^81+^*82+^83 
*8 = ^ « 1+^81 

and x v = \+x vl . 

Combining this with (13.2.10) gives, for the standard 

(*si-*s) = -1 
(ar sa — x B ) = 0 

(x 3 3-^) = +l 



(13.2.11) 



(13.2.12) 



and similarly (x v —x v ) = — 1, 0, + 1 for low, middle, and high doses of 
unknown. 

Because the assay is symmetrical (see §§ 13.1 and 13.8.1) each dose 
is given the same number of times, n. The total number of observations 
is N = 6n and the number of standards is n B = 3n = \N t and of 
unknowns n 0 = 3n = ^N. Now (x—x) 2 = +1 for all high and low 
doses, and 0 for all middle doses so 

Z(* a -f s ) 2 = £(*u-*u) a = W (13.2.13) 



Copyrighted material 



290 Assays and calibration curves § 13.2 

the summations being over all \N doses (n low, n middle, and n high) 
of standard or unknown. The total sum of squares for x, pooling stand- 
ard and unknown, is 

22(* - x) 2 m 2(x 8 - x 8 ) 2 + Z(x v - x v )* - \X (13.2.14) 

where J means sum over preparations as in (13.2.8). 
s.u 

13.3. The theory of parallel line assays. The potency ratio 

This discussion applies to any parallel line assay, symmetrical 
(see §§ 13.1 and 13.8) or not. Numerical examples are given in §§ 13.11, 
13.12, 13.13, and 13.15. 

According to (13.1.1) the ratio of the concentration of unknown to 
concentration of standard is 

concentration of unknown 



R 



concentration of standard 

amount of standard for given effect z B 
amount of unknown for same effect z 



(13.3.1) 

u 



where the prime indicates doses estimated to produce identical responses. 
As in § 13.1 the dose z will be measured in the same units (e.g. volume of 
solution, or weight of solid) for both standard and unknown (what 
happens when the units are different is explained in the numerical 
example in § 13.11). 

The conventional symbol for the log of the potenoy ratio is M so, 
from (13.3.1), 

M ~ log R = log z 8 '— log Zu = x 8 — xg. (13.3.2) 

As in § 13.1, M = x 8 '— Xu the difference between the logs of equi- 
effective doses, is the horizontal distance between the parallel lines as 
shown in Fig. 13.3.1. The least squares estimate of this quantity will 
now be derived. 

When straight lines are fitted to the standard and unknown responses 
the lines are constrained to be parallel, i.e. an average of the observed 
slopes for S and U is used for both (see § 13.4). If this common slope 
is called b, the linear regression equations (see §§ 12.1 and 12.7, and 
eqn. (12.3.1)) are written 

Y B = 9 m + &(*s-*s). 

Y v = ^ u +6(x u -x u ). (13.3.3) 



Copyrighted material 



§ 13.3 Assays and calibration curves 291 

When the response is the same for standard and unknown Y B — Y v , so 
these can be equated giving 

where Xq and Xy are the log doses giving equal responses as above. 
Rearranging this to give M — x^—x^ from (13.8.2), gives the result 




h (*,-*>■) ■! , 

* t . 1, Jog dee* (x) 

Fio. 13.3.1. Geometrical meaning of the equation (derived in the text) 
for the log potency ratio (Jf = log R) in any parallel line away. 

M — log R — a£-x v = (* g -g 0 )+ ( * u ~* s) - (13.3.4) 

The geometrical meaning of the right-hand side is illustrated in Fig. 
13.3.1. To find the potency ratio, R, the antilog of M can be found 
from tables if common logarithms (to base 10) have been used. However 
in symmetrical assays it has been shown that it is better to use logar- 
ithms to a different base, say base r in general (it was shown in § 13.2 
that r = y/D is best for 2+ 2 dose assays and r = D for 3+ 3 dose 
assays). Since antilog tables are available only for base 10 logarithms 
it will be necessary to convert to base 10 before looking up antilogs. 
The general formulaf for changing the base of logarithms from a to 6 is 

logo * = log* log a * (13.3.6) 
from which it follows that 

log 10 R = log r R. log 10 r = M. log 10 r. (13.3.6) 

t Proof. From the definition of log., antilog, i-^mdio 6>«." - *. Al*>, in general, 
n log * - log af. Thoe log, t.Iog. 6 ~ log. (&»••»•) — log, *. 



Copyrighted material 



292 Assays and calibration curves § 13.3 

Therefore, multiplying (13.3.4) by the conversion factor, log 10 r, gives 



log 10 R = [(i 8 -i 0 ) + ^^]. Iog 10 r. (13.3.7) 

This is a perfectly general formula whatever sort of logarithms are used. 
If common logs were used, r = 10 so the conversion factor is Iog 10 10 
= I. 

For symmetrical assays (as denned in (13.8.1) and at the end of 
§13.1) this expression can be simplified, as shown in § 13.10. 



13.4. The theory of parallel line assays. The best average slope 

For estimation of the potency ratio it is essential that the response- 
log dose lines be parallel (see §§ 13.1, 13.3). Inevitably the line fitted 
to the observations on the standard will not have exactly the same slope 
as the line fitted to the unknown; but, if the deviation from parallelism 
is not greater than might reasonably be expected on the basis of 
experimental error, the observed slope for the standard (6 a ) is averaged 
with that for the unknown (6u) to give a pooled estimate of the pre- 
sumed common value. 

By 'best' is meant, as usual, least squares (see §§ 12.1, 12.2, 12.7, 
and 12.8). A weighted average of the slopes for standard and unknown is 
found using (2.5.1). Calling the average slope b, this gives 

b = = . (13.4.1) 

s.u 

where the weights are the reciprocals of the variances of the slopes 
(see § 2.5). The estimated variances of the individual slopes, by (12.4.2), 
are 

fU i (13 ' 4 - 2> 

T8rl4ul = £<*„-*„)» " 

where ^[y] is, as usual, the estimated error variance of the observations 
(the error mean square from the analysis of variance). 



Copyrighted material 



§ 13.4 Assays and 

Now in general the variance of the weighted mean $ = I,w 1 y t l'Lw t 
will be given by (2.7.12) as 

VMls] = i; (13 - 4 - 3 > 



Taking w B = l/var[6 8 ] and w v = l/var[6u] from (13.4.2), and 
ing the estimate of the slope from (12.2.7) gives 

Eta-is) 3 Ey fl (x a -x a ) £y a (* a -x a ) 

" A " SWT ' 2(*e-*s) a = ?M (13 ' 4 4) 

and similarly for unknown. Inserting these results in (13.4.1) gives the 
weighted average slope 

Sysfrs-^+Syufro-iu) _ £P* { *~ £) 

S.U 

where the symbol J means, as before, 'add the value of the following 

B.U 

quantity for the standard to its value for the unknown'. In other 
words, the average slope is simply (pooled sum of products for S and U)j 
(pooled sum of squares of x). 

For symmetrical assays it was shown in § 13.2 that x) 2 is the 
same for standard and unknown so, from (13.4.2), the weights are 
equal and the two slopes (6 8 and b v ) are simply averaged. 

From (13.4.2) and (13.4.3) it follows that the variance of the average 
slope is, in general, estimated as 

rf .,_J_ * a [y] _ *&} 

Varl ° J ~Zw~ Z,(x B -z B )>+Z,(x v -x v )> ~ 22(x-i) a (u *-*) 

8.U 

(compare this with (12.4.2)). It is, of course being assumed that the 
variance of the observations, rty], is the same for standard and un- 
known as well as for each dose level— see §§ 11.2, 12.2, and 13.1. 

13.5. Confidence limits for the retio of two normally distributed 
varisbles : derivation of Fieiler's theorem 

The solution to the problem posed in § 7.5 will now be given. The 
result, in its general form, looks rather complicated ; but the numerioal 
examples in §§ 13.11-13.14 show how easy it is to use. 

Although the sum or difference (or any linear combination, see p. 39) 
of two normally distributed (Gaussian) variables is itself normally 



Copyrighted material 



294 Assays and calibration curves 



distributed, their ratio is not. Therefore, as discussed in § 7.5, the 
methods discussed so far cannot provide confidence limits for the 
ratio. A solution of the problem will now be described. 

The simplest application of the result is to find the confidence limits 
for the ratio (= m, say) of two means (see § 14.1), the problem dis- 
cussed in § 7.5. It is shown below that if g (eqn (13.5.8)) is very small 
compared with one, so (1 —g) 1, the result of using Fieller's theorem 
is the same as the approximate result, m±ty/[vsx{m)], where var(m) 
is given, approximately, by (2.7.16). 

The theorem is needed to find confidence limits for the value of 
the independent variable (x) necessary to produce a given value of the 
dependent variable (y) as discussed in § 12.4. A numerical example of 
this 'calibration curve problem' is given in § 13.14. The confidence 
limit* for a potency ratio are also found using Fieller's theorem. 

Before considering a ratio, the argument of § 7.4 leading to con- 
fidence limits for a single Gaussian variable, y, will be repeated in a 
rather more helpful form. If y is normally distributed with population 
mean u and estimated variance s 3 then y—u is normal with population 
mean = p—p = 0 and variance s 2 , so, as in § 4.4, f = {y—^js 3 . As in 
§ 7.4 the 100 a per cent confidence limits for the value of u are based 
on Student's t distribution (§4.4) whioh implies 

P[-ts < (y-ju) < +ls] = a (13.5.1) 
or, in other words (see § 11.3, p. 182), 

it(y-i") a < *V] = «• (13.5.2) 

The deviation {y—fi) will border on significance when it is equal to 
— ts or +ts, i.e. when 

{y-tf = <V. 

This is a quadratic equation in u and solving it for /i using the usual 
formula! gives as the two solutions fi = y—ts and /i = y-\-ts, the 
confidence limits for u found in § 7.4. This seems a long way round to 
get the same answer as before, but the approach will now be used to 
find the confidence limits for a ratio. 

Consider, in general, any ratio u — a//?. The estimate (m) of the 
population value («) from the observations will be written m = ajb 
where a is the estimate of a, and 6 is the estimate of /?. The case of 
interest, or, at any rate, the case to be dealt with, is when a and 6 are 

f In general, if <w" + hx + c - 0 than * -[-6 ± y/(b* -iae)]}2a. 



Copyrighted material 



§ 13.5 Assays and calibration curves 295 

normally distributed variable* (with population means a and 0). The 
variance* of a and b must be specified and this will be done using a 
new notation. This notation is based on the fact that not only the 
variances but also the covarianoes (in analysis of variance problems 
that are linear in the general sense discussed in § 12.7) can be expressed 
as multiples of the error varianoe of the observations, e^y] (as usual 
this is the error mean square from the analysis of varianoe). For ex- 
ample, the varianoe of a mean is, by (2.7.8), 1/n times the error 
variance. Similarly the varianoe of a slope, 6, is, by (12.4.2), \JL{x—xf 
times the error variance. If these multiplying factors are symbolized v 
then one can define 



var[a] =s » 11 * 2 , 
var[6] = v^s 2 , 
oov[a, 6] = v^, 



(13.5.3) 



where s 2 is written for rfy]. The subscripts distinguishing the variance 
multipliers, v, are arbitrary (of. § 2.1), but the notation used emerges 
naturally from a more advanced treatment, and is that used in Finney 
(1964), who discusses Fieller's theorem and two of its extensions. For 
example, if a was a mean, then t> u = 1/n as above. 

Since o and b are normally distributed and p is a constant, the 
variable (a— fib) is a linear function of normal variables, and is therefore 
normally distributed. The population mean of (a— fib) will be a— ftp b» 0 
and its estimated varianoe will be, using (13.6.3), (2.7.2), (2.7.6), and 
(2.7.6), 

var[(a— fib)] = var[a]-j-var(>&]— 2 cov[a,^6] 
sa var[a]+/* 3 var[6]— 2p oov[a,5] 
= « i (« u +/A ta -2^ ia ). (13.5.4) 

Now, by direct analogy with (13.5.2), it follows from the definition of 
Student's / that 

P[{a-fib) 2 < t 2 s 2 {v xl -\- ( i z v XL '~2fiV x2 )] = a. (13.5.5) 

And, again by analogy, the 100 a per oent confidence limits for fi are 
found by solving for /* the equation 

{a- fib) 2 = <V(v u -f-/< a t> aa -2/iv ia ). (13.6.6) 



Copyrighted material 



296 Assays and calibration curves § 13.5 

This is again a quadratio equation in p and when solved for ft by the 
usual formula (see above) the two solutions are the required confidence 
limits for p. They are 

^l m -^AA v "- imv ^ n ^A v -- v &\ (i3 - 5 - 7) 

where 

*Vv aa 

9 = -p— (13.5.8) 

Simplifications of Fieller's theorem in special cases 

If a and b are independent, i.e. t/ ia = 0, the result simplifies con- 
siderably, giving the confidence limits for u as 

The quantity g defined in (13.6.8) can be considered an index of the 
significance of the difference of b (the denominator of the ratio) from 
zero. This is clearly important because if the denominator could be 
zero, the ratio could be infinite. The effect of the (1 —g) in front of the 
± sign is to raise both upper and lower limits, i.e. unless g is very small 
the limits are not symmetrical about m. Since var(ft) = v 27 s* by 
(13.6.3) it follows that if b 2 < t 2 s 2 v 22 t if 0 > 1, then b would be 
judged 'not significantly different* from zero at the level of significance 
fixed by the value of t chosen, and useful limits could not be found. In 
other words \jg is the square of the ratio of b to t times the standard 
deviation of 6. 

If g is very small, as it will be in good experiments, then the general 
formula, (13.6.7), simplifies giving the (symmetrical) confidence limits 
for fi as 

to 

«t±^V( v n— 2m» ia -f-m a t; aa ). (13.5.10) 

This is the result that would be obtained by treating m as roughly 
normally distributed and calculating m±*VT var ( m )]» as in § 7.4, 
using the approximate formula, (2.7.18), together with (13.6.3) to give 

, » 2 / var (°) , var (&) 2covM)\ 
s 2 

= T^(v 11 —2mv l3 +m 2 v 22 ). (13.5.11) 



Copyrighted material 



§ 13.5 



Assay 8 and calibration curves 297 



If a and b are unoorrelated (t> ia = 0), as well g < 1, then the confidence 
limits for ji can again be found as m ± ty/[v*r{m)] t the approximate 
expression for var(m), (13.5.11), simplifying even further to 

./var(a) var(6)\ 
var(m) ~ m 3 ^— ^ — | — — J 

^pv n +mH 22 ). (13.5.12) 

This is the variance given by the approximate formula, (2.7.16) 
(because C^ro) = var(m)/m a , etc., from the definition (2.6.4)). 

Examples of the use of the results in this section will occur in §§ 13.6 
and 13.10-13.14. 



13.6. The theory of parallel line aesays. Confidence limits for the 
potency ratio and the optimum design of assays 

This discussion applies to any parallel line assay. The simplifications 
possible in the case of symmetrical assays (see § 13.1) are given later in 
§ 13.10, and numerical examples in § 13.11 onwards. 

The logarithm of the potency ratio (i?) is M = log J? as in § 13.3. It 
will be convenient to rearrange the formula for the potency ratio, 
(13.3.4), to give 

M-(x B -x v ) = ^^- (13.6.1) 

The term (x B —x v ) has zero variance because x is supposed to be meas- 
ured with negligible error (see §§ 12.1, 12.2, and 12.4), and bo can be 
treated as a constant. The approach is therefore to find confidence 
limits for the population value of ($ v —$ a )l b ftnd tnen tne constant 
{£ 8 — x u) to the results. Now if the observations are normally distributed 
then bo are (y v — $ B ) and (as explained in § 12.4) the average slope, 6. 
The right-hand side of (13.6.1) is therefore the ratio of two normally 
distributed variables, and confidence limits for it can be found using 
Keller's theorem (§ 13.5). 

The variance multipliers defined in (13.5.3) are required first. 
From (2.7.3) and (2.7.8) it follows that v&r{y v -y B ] = ^[yJ/nu-j-^/ng 
and therefore 




(13.6.2) 



Copyrighted material 



Assays and calibration curves f 13.6 

where n v and n B are the numbers of responses to the unknown and 
standard preparations. The variance of the average slope, 6, in the 
denominator, is, from (13.4.6), var[6] = sfy]/ J ^(x-xf so 



B.U 



P« = 



IK*-*) 2 ' (W.6.S) 

S.U 



the notation being explained in § 13.4. Because it can be shown (see 
§ 13.10) that ($ v —$b) And b are un correlated, i.e. have zero oo variance, 
it follows that v l2 = 0 (see also § 12.7). Thus the simplified form of 
Fieller's theorem, eqn. (13.5.9), can be used to find confidence limits for 
the ratio {$ u -$ B )lb. Using (13.6.2) and (13.6.3) 

(1-tf) 



'»)/» , * /fM/1 , l\ , Vy-frfl* ] 



where from (13.6.8) and (13.6.3), 

*~ ft 3 1 !(*-*)** (13 ' 6 5) 

s.u 

From (13.6.1) it follows that (13.6.4) gives the confidence limits for 
M — (x B — x v ), so the confidence limits for the log potency ratio, M, are 
(x 8 -x 0 )+[13.6.4]. 

To find the confidence limits for the potency ratio itself the anti- 
logarithms of these limits are required. Now, as discussed in § 13.3, the 
calculations are often carried out not with logarithms to base 10 of the 
dose, but with some other convenient base, say r. In this case M 
= log r i?, and, as explained in § 13.3, it is necessary to multiply by 
l°gi o f . to convert to logarithms to base 10, before looking up the anti- 
logarithms. The confidence limits for true value of log 10 i? are thus 




(13.6.6) 



A numerical example of the use of this general equation occurs in 
§ 13.13. 



Copyrighted material 



§ 13.6 



Assays and calibration curves 299 



Simplification of the calculation for good assays 

If the slope of the log dose-response line, b, is large oompared with 
the experimental error then g will be small (see § 13.5), so (1—0) csl 1. 
Inserting this into (13.6.6), together with the definition of M = log,* 
from (13.3.4), gives the oonfidenoe limits for log 10 # as approximately 



This is equivalent to treating the log potenoy ratio, Jf , as approximately 
normally distributed and calculating the limits as M ± t V[var(lf )], as 
in § 7.4, with 



whioh can alternatively be inferred directly from (13.5.12 

The optimum design of assays 

The aim is to make the confidence limits for the potenoy ratio as 
narrow as possible, i.e. the result of the assay as precise as possible. 
Ways of doing this can be inferred from (13.6.6). 

(1) g should be small. In other words (see discussion at the end of 
§ 13.5) the slope of the response-log dose lines should be as 
large as possible relative to its standard deviation. If g is large 
(approaching 1) the limits for the log potenoy ratio will beoome 
wide because of the term involving g after the ± sign in (13.6.6), 
and also unsymmetrioal about M because of the g in the term 
before the ± sign both upper and lower limits are raised when g is 
large. 

(2) 8, the error standard deviation should be small. That is the 
responses should be as reproducible as possible; and the error 
variance reduced, if possible, by giving the doses in designs such 
as randomized blocks, as described in § 13.1 and illustrated in 
§§ 13.11 and 13.12. 

(3) 6 should be large, to minimize the term after the ± sign in 
(13.6.6). A steep slope will also minimize g. 




(13.6.7) 




(13.6.8) 



Copyrighted material 



300 Assay 8 and calibration curves 



(4) (l/wu+l/n B ) should be small. That is, as many responses as 
possible should be obtained. For a fixed total number of responses 
(1/nu+l/ng) is at a minimum when n v — n g so a symmetrical 
design (see § 13.1) is preferable. 

( 6 ) (Sv~9b) should be small because it occurs after the + sig 11 m 
§ 13.6. That is, the size of the responses to standard and unknown 
should be as similar as possible. The assay will be more precise 
if a good guess at its result is made beforehand. 

(6) !I(.r— x) u should be large. That is, the doses should be as far 
apart as possible, making {x— x) large; but the responses must, 
of course, remain of the straight part of the response-log dose 
curve. 

13.7. The theory of parallel line assays. Testing for non-validity 

This discussion is perfectly general for any parallel line assay with at 
least 2 dose levels for both standard and unknown, i.e. (2+2) and 
larger assays. The simplifications possible in the case of symmetrical 
assays are described in §§13.8 and 13.9 and numerical examples, are 
worked out in §§ 13.11-13.13. The (&+1), e.g. (2+1), dose assay is 
discussed in § 13.15. 

In the discussion in § 13.1 it was pointed out that it will be required 
to teat whether the slope of the response-log dose lines differs from 
zero ('linear regression' as in § 12.5), and whether there is reason to 
believe that the lines are not parallel. If more than 2 dose levels are used 
for either standard or unknown it will sIbo be possible to test the 
hypothesis that the lines are really straight. The method of doing these 
tests will now be outlined. 

Each dose level gives rise to a group of comparable observations 
and these can be analysed using an analysis of variance appropriate 
to the design of the assay, the dose levels being the 'treatments' 
of Chapter 11, as discussed in §§12.6 and 13.1. For example, for a 
(2+2) dose assay there 4( = k, say) 'treatments' (high and low standard, 
and high and low unknown), and a (3+4) dose assay has k — 7 'treat- 
ments'. The number of degrees of freedom for the 'between treatments 
sum of squares' will be one less than the number of 'treatments' (or 
'doses'), i.e. at least 3 as this section deals only with (2+2) dose or 
larger assays (cf. § 11.4). Now the 'between treatments' (or 'between 
doses') sum of squares can be subdivided into components just as in 
§ 12.6. This partition can be done in many different ways (see § 13.8 
and, for example, Mather (1951), and Brownlee (1965, p. 517)), but 



Copyrighted material 



§ 13.7 



Assays and calibration curves 301 



only one of these ways is of interest. Each component must be un- 
correlated with all others and this will be demonstrated, in the case of 
symmetrical assays, in § 13.8. Three components, each with one degree 
of freedom, can always be separated; (a) linear regression, (b) deviations 
from parallelism, and (c) difference between standard and unknown 
responses, as described in § 13.1. If there are more than 3 degrees of 
freedom (i.e. more than 4 'treatments') the remainder can be lumped 
together as 'deviations from linearity' (cf. § 12.6), as in Table 13.1, 
or further subdivided as in §§ 13.10 and 13.12. The analysis thus has 
the appearence of Table 13.1 if there are k dose levels ('treatments') and 
N responses altogether. 



Table 13.7.1 



Source of variation 


Degrees of 
freedom 


Sum of squares 


Linear regression 


1 


A 


Deviations from parallelism 


1 


B 


Between standard and unknown 


1 


C 


Deviations from linearity 


*-4 


D-(A+B + C) 


Between 'treatments' or dose levels 


k-l 


D 


Error (within 'treatments') 


N~k 




Total 


N-l 





The bottom part of the analysis would look like Table 11.6.1 or Table 
11.8.2 if a randomized block or Latin square design (respectively) were 
used. 

(1) Linear regression. To test whether the population value of the 
slope differs from zero, the appropriate sum of squares (SSD) is, from 
(13.4.5) by analogy with (12.3.4), 

SSD for linear regression = 

[Sy 8 (j: B -x 8 )4 -S:y u(3:u-^u)] a 

Z(Zs-*8) 2 +2<*u-iu) 2 ' ( ' 

(2) Deviations from parallelism. To test whether the lines are parallel 
it aeems reasonable to calculate the difference between (a) the total 
sum of squares for linear regression for lines fitted separately to 
standard and unknown (from 12.3.4), and (6) the sum of squares for 
linear regression when the slopes are averaged (i.e. (13.7.1)), because 
this difference will be zero if the lines are parallel. Thus 



Copyrighted material 



302 Assays and calibration curves § 13.7 

SSD for deviations from parallelism 

Fya^a-^a)] 3 , [Syu(*u-*u)] a OQTW . 

~rr f r-^-+-^7 r-T5 SSD for linear regression. (13.7.2) 

(3) Between standard and unknown. This is found directly from 
(11.4.5) as 

SSD between S and U = ( i 3 . 7 . 3) 

(4) Deviations from linearity. This is found as the difference between 
the sum of squares between 'treatments' (from (11.4.5)), as in Table 
11.4.3, and the total of the above 3 components. It must be zero for 
a (2+2) dose assay (k = 4) when the sum of (13.7.1), (13.7.2), and 
(13.7.3) can be shown to add up to the sum of squares between treat- 
ments. 

A numerical example of the use of these relations is worked out in 
§13.13. 

13.8. The theory of symmetrical parallel line assays. Use of 
orthogonal contrasts to test for non-validity 

Numerical examples are given in §§ 13.11 and 13.12. Symmetrical 
(in this context) means, summarizing the definition in § 13.1, 

n = number of responses at each of the k dose levels 

(see § 13.7), same for all 
N = kn = total number of responses, 
k B = kfj — \k — number of dose levels for standard and 

to unknown (same for both), (13.8.1) 

D = ratio between each dose and the one below it. The same for all 
doses, and for standard and unknown (see also §§ 13.1 and 13.2 
and Fig. 13.8.1), so the doses are equally spaced (by log D) on 
the logarithmic scale. 

The symmetrical (2+2) dose assay. Contrasts 

There are k — 4 dose levels, low standard (LS), high standard (HS), 
low unknown (LU) and high unknown (HU). The Jb— 1 = 3 degrees of 
freedom between dose levels can be separated into 3 components as 
described in §§ 13.1 and 13.7 (Table 13.7.1). A simpler approach than 
that in § 13.7 is possible. 



Copyrighted material 



§ 13.8 



Assay a and calibration curves 303 



As usual a hypothesis is formulated. Then the probability that observa- 
tions would be made, deviating from the hypothesis by as much as, or 
more than, the experimental results do, if the hypothesis were in fact 
true, is caloulated (cf. § 6.1). 

(1) Linear regression. From Fig. 13.8.1 it is clear that •/ the null 
hypothesis (that the true value, /?, of the average slope, see § 13.4, is 
zero) were true then, in the long run, the responses to the high doses 




log dote (x) 



Fio. 13.8.1. The symmetrical 2 + 2 dose parallel line assay. 

O Mean of n observed responses (e.g. £ BD is the mean of the 

n responses to r HV ). 
Straight lines between observed points, with slope b v 

for unknown and b a for standard. 
Best fitting parallel lines with slope 6 (= average of 

b B and h v . see $ 13.4). 

would be the same as those to low doses, i.e. #hu+ Vhs — 3?lu+#lb« 
It follows that if the regression contrast, L lt is defined as 

X'i = . -ZyLs+2y H s-2y L u+ 2yHU ( 13 * 8 * 2) 

(a linear combination of the responses), it will be a measure of departure 
from the null hypothesis. If the null hypothesis were true the population 
mean value of L, would be zero (as long as each dose level is given the 
same number of times so the total responses can be used in place of the 
mean responses). In a small experiment L x would not be exactly zero 

31 



Copyrighted material 



304 Assays and calibration curves 



§ 13.8 



even if the null hypothesis were true, and it is shown § 13.9 how to 
judge whether L x is large enough for rejection of the null hypothesis. 

(2) Deviations from parallelism. From Fig. 13.8.1 it is clear that if 
the null hypothesis (that the population lines are parallel, § B — see 
§ 13.4) were true then, in the long run, 9bv~Vlv ~ #hs — Vls- Therefore 
deviations from parallelism are measured, as above, by a deviations 
from parallelism contrast, L[ defined as 

L{ - 2yLB-2y M -2y LU +2ayHU ( 138 ' 3 > 
Again the population value of L[ will be zero if the null hypothesis is 
true. 

(3) Between standard and unknown preparations. If the null hypo- 
thesis that the population mean response to standard is the same as 
that for unknown were true then, in the long run, # L s+yH8 = Vlv^Pru- 
Departure from the null hypothesis is therefore measured by the 
between S and U (or between preparations) contrast, L p , denned as 

L p = — Ej/ls— 2y H 8+2y LU +Zy H u. (13.8.4) 
which will have a population mean of zero if the null hypothesis is 
true. 

These contrasts are used for calculation of the analysis of variance 
and potency ratio, as described below and in §§ 13.9 and 13.10. 

The subdivision of a sum of squares (cf. § 13.7) using contrasts 
is quite a general process described, for example, by Mather (1951) and 
Brownlee (1965, p. 517). The set of contrasts used must Batisfy two 
conditions. 

(1) The sura of the coefficients of the contrast must be zero. In 
Table 13.8.1 the coefficients (which will be denoted a) of the response 
totals for the contrasts denned in (13.8.2), (13.8.3), and (13.8.4) are 
summarized. In each case Sdt = 0 as required. This means that the 
population mean value of the contrast will be zero when the null 
hypothesis is true.f 

(2) Each contrast must be independent of every other. A set of 
mutually independent contrasts is described as a set of orthogonal 
contrasts. It is easily shown (e.g. Brownlee (1965, p. 518)) that two 
contrasts will be \in correlated (and therefore, because a normal distri- 
bution is assumed, independent, see § 2.4) when the sum of products 
of corresponding coefficients for the two contrasts is zero. All results 

| In the language of Appendix 1, E[L] =» E[£atT/] where T t is the total of the n 
responses of the jth treatment (dose). If all the observations were from a single popula- 
tion, aU EfT,] = n/i where E[y] = ft. Thus E{L] =- nftla = 0 if la = 0. 



§ 13.8 



Assays and calibration curves 



305 



necessary for the proof have been given in § 2.7. It is shown in the 
lower part of Table 13.8.1 that this condition is fulfilled for all three 
possible pairs of contrasts. 

Tablk 13.8.1 



The upper part summarizes the coefficients (a) of the response totals for 
the validity tests for the symmetrical (2+2) dose assay. The lower part 
demonstrates the orthogonality {i.e. independence) of the contrasts 





EtfLS 


Evhs 








Linear regression L x 
Parallelism L[ 
Preparations (S and U) 


-1 
+ 1 

-1 


+ 1 
-1 

-1 


-1 
-1 

+ 1 


+ 1 
+ 1 

+ 1 


0 4 
0 4 

0 4 


Total 


«t, x <x tp 


-1 

+ 1 
-1 


-1 
-1 
+ 1 


+ 1 
-1 
-1 


+ 1 
+ 1 
+ 1 


0 

0 
0 



These conditions mean that if two contrasts are defined, to measure 
linear regression and deviations from parallelism, say, there is no 
choice about the third, which happens to measure the difference 
between $ B and y v . 

The symmetrical (3 + 3) dose assay 

There are k = 6 levels, say SI, S2, S3, Ul, U2, and U3, where SI is 
the lowest, S2 the middle, and S3 the highest standard dose. There are 
k— 1 = 6 degrees of freedom between dose levels so after separating 
components for linear regression, deviations from parallelism and 
between S and U, there are two degrees of freedom left for deviations 
from linearity (see Table 13.7.1). The first three contrasts are 
constructed from response totals by the same sort of reasoning as for 
the (2+2) assay and the coefficients (a) are given in Table 13.8.2. 
Deviations from linearity can be further divided into two components 
each with one degree of freedom. If the average curve for S and U is 
straight then (for a symmetrical assay) the responses to the middle 
doses will be equal to the mean of the responses to the high and low 
doses, i.e., in the long run (y 8 i+0s3+0ui+yu 3 )/ 4 =» 0aa+0ua)/2. 



Copyrighted material 



306 Assays and calibration curves § 13.8 

Therefore a deviation from linearity contrast, L a , measuring departure 
from the hypothesis of straightness, can be defined as 

= 2y 81 -2Sy ga +Sy S 3+Sy 0l -22y ua +Sy U8 (13.8.5) 

and this will be zero in the long run if the average line is straight. The 
fifth oontrast is dictated by the conditions mentioned above. It is 
called L£, and inspection of the coefficients in Table 13.8.2 shows that 

Table 13.8.2 

Coefficients (a) of the response totals for the orthogonal contrasts in a 

symmetrical (3 + 3) dose assay 

Response totals La La 3 



Contrast Ly sl 2y M Ey 8S Ey 0 i Ly™ SVot 



* 




0 


1 


-1 


0 


1 0 


4 




0 


-1 


-1 


0 


1 0 


4 






-1 


-1 


1 


1 


1 0 


6 


4 




-2 


1 


1 


-2 


1 0 


12 




2 


-1 


1 


-2 


1 0 


12 



It can easily be checked that, as in Table 13.8. 1, the sum of the products of the 
coefficients of corresponding totals is zero for all possible pairs of contrasts, so 
all pain of contrasts are orthogonal. 



it is a measure of the extent to which deviations from linearity are the 
same for the standard and unknown. It is therefore called the difference 
of curvature contrast (cf. L[ which measures the extent to which 
the linear regressions differ between S and U, i.e. deviations from 
parallelism). 

13.9. The theory of symmetrical parallel line assays. Use of 
contrasts In the analysis of variance 

The notation used is defined in § 2. 1 and (13.8. 1). In conformity with 
the usual approach in the analysis of variance, it is required to calculate 
from each contrast a quantity (the mean square) that will be an estimate 
of the error variance, ^[y], if the appropriate null hypothesis is true. 
These estimates will then be compared with the error variance (which 
estimates a 3 whether or not the null hypotheses are true), in the usual 
way (see § 11.4). Numerical examples are given in §§ 13.11 and 13.12. 

The first step is to estimate the variance of a oontrast. If T y is used 



Copyrighted material 



§ 13.9 



Assays and calibration curves 307 



to stand for the total of the n responses to the jth dose level, then the 
contrasts denned in § 13.8 all have the form 

L = £a,T,. (13.9.1) 

The variance of this, from (2.7.10), is Ea a var(!T y ) and from (2.7.4) it 
follows that var(Ty) = nrfy] where a*[y] is the estimated variance of 
the observations and n is the number of observations in eaoh total. 
Thus 

var(L] = iw^a 3 . (13.9.2) 

The values of 2<x a are worked out in Tables 13.8.1 and 13.8.2. 

It might be supposed that it is not possible to estimate the variance 
of L directly from the observed scatter of values of L, because there is 
only a single experimentally observed value for eaoh contrast. However 
it is what happens when the null hypothesis is true that is of interest, 
and it was shown in § 13.8 that when it is true the population mean 
value of each L will be zero. Now it was pointed out in § 2.6 (eqn. 
(2.6.3)) that if there are N observations of y, and if the population 
mean value {u) of y is known, then the estimate of the variance of y is 
Z(y—t*) 2 IN (the divisor is only N—l when the sample mean is used in 
place of pi). For a single value of L it follows that, on the null hypothesis, 

var[L] = (Z-0) a /l = I?. (13.9.3) 

Equating (13.9.2) and (13.9.3) shows that when the null hypothesis 
(that the population value of L is zero) is true, an estimate of the error 
variance is provided by 




(13.9.4) 



and this expression also gives the sum of squares required for the 
analysis of variance, because each sum of squares has one degree of 
freedom— see §§ 13.7, 13.8, 13.11, and 13.12 — so the sum of squares is 
the same as the mean square. 

It is not difficult to show (try it) that, when the appropriate base is 
used for the logarithms giving (13.2.6), (13.2.7), (13.2.12), and(13.2.13), 
the sums of squares for testing validity given by the general formulas 
(13.7.1), (13.7.2), and (13.7.3) are the same as those given by (13.9.4), 
using the definitions of the contrasts in § 13.8. The demonstrations 
follow the lines used in the next section. 



Copyrighted material 



308 Assays and calibration curves 



§ 13.10 



13.10. The theory of symmetrical parallel line assays. Simplified 
calculation of the potency ratio and its confidence limits 

The general results in §§ 13.3, 13.4, and 13.6 can be simplified when 
the appropriate dose metameter is used (see § 13.2). The notation, and 
the definition of symmetrical, are given in (13.8.1). Numerical examples 
are given in §§ 13.11 and 13.12. Try to suspend your belief that this is 
a very complicated sort of simplification until you have compared the 
calculations for symmetrical assays (in §§ 13.11 and 13.12) with those 
for an unsymmetrical assay (§13.13). 



The symmetrical (2-f 2) dose assay 

The best dose metameter in this case was shown in § 13.2 to be 
x = log,*: where z = dose and r = \/D. The consequences of using this 
base for the logarithms, derived in § 13.2, can be used to simplify the 
ratio (yv—9s)lb which occurs in the potency ratio and its confidence 
limits. Taking the numerator first, p e ) is, as expected, simply 
related to the between-preparations contrast, L p . Thus, from (13.8.1) 
and (13.8.4), 

tr « \ E Vu ^s ZyLu+Iyau-Syui-Et/Hs 



n v n B 

= p* (13.10.1) 

The average slope, b (see § 13.4) is related to the regression contrast, 
L lt as expected. From (13.4.5), (13.2.6), (13.2.8), and (13.8.2) it follows 
that 

= Sy a (a: B -x a ) +Sy u (j: u -f u ) 
S(x 8 -x 9 ) 2 -f£(aru-^u) 2 " 

(*L8 -*B)SyLS + (*HB -*8)SyHg-+ (* LU ~*u)Sy LU + (x BV -X v )l.y E1 j 

8.U 

- SyLa+ Sy ga -Sy LU +S y HU L x 
= ^ = Y (13.10.2) 

Combining (13.10.1) and (13.10.2) gives 

(yu-0s)/& = 2Lp/^i- (13.10.3) 



Copyrighted material 



§13.10 Assays and calibration curves 309 

Furthermore, from (13.2.5), 

*b-*u - *ls-*lu = log^-log^ = lqgrfeu/Xuj) 

(13.10.4) 

and from (13.3.6) 

(Xe-fu) log l0 r = log 10 (WzLu). (13.10.fi) 

The potency ratio 

Substituting (13.10.3) and (13.10.5) into the general formula for the 
log potency ratio, (13.3.7), gives 

log 10 i* = 1 °8io(~)+^ Plo «io»-. 

Putting r = y/D (remembering that log y/D — log D k = \ log D), 
taking ant i logarithms gives 

R = (^) MI * ao 8io[^ogio^]- (13.10.6) 

The confidence limits 

It was mentioned in § 13.6 that {y v — p 8 ) is not correlated with 6 and 
so v 12 = 0. This follows (using (13.10.1) and (13.10.2)) from the fact 
that L p and L x were shown in § 13.8 to be un correlated. 
From (13.6.2) and (13.8.1), 

«ii = (l/»u+V«s) = HHN+lUN) = (13.10.7) 

And from (13.6.3) and (13.2.8), 

«aa =1/2 2(*~*) 2 = • (13.10.8) 

B.U 

Substituting (13.10.2), (13.10.3), (13.10.5), (13.10.7), (13.10.8) and r - - 
y/D (again log y/D = \ log D) into the general formula for the 
confidence limits for log 10 i? (13.6.6), and taking antilogarithms, gives 
the confidence limits for the population value of R, the potency ratio 
in a symmetrical (2 j 2) dose assay, as 

(13.10.9) 



Copyrighted material 



310 Assays and calibration curves §13.10 
>, from (13.6.5), (13.10.2), and (13.10.8), 

NsW 



(13.10.10) 



If g is very small so (l—g) c~ 1, then (13.10.9) can be further sim- 
plified. As explained in § 13.6, this equivalent to treating log 1Q 7? 
as approximately normally distributed, and calculating confidence 
limits for its population value as log 10 # ± t s[\og l0 R] as in $ 7.4, 
where the approximate standard deviation of \og 10 R, from (13.10.9) 
(or from (13.6.8)), is 

*[log 10 *] ~ JS^S-.Vt^O+^i) 8 }]. (13.10.11) 

A numerical example of the use of (13.10.9) and (13.10.11) is given in 
§13.11. 

The symmetrical (3+3) dose assay 

The simplifications follow exactly the same lines as those just 
described. (From (13.8.1) and the definitions of the contrasts in Table 
13.8.2, 

Etfu Zy B Syui +2^3+^,3-2^8! -Syga-Lyaa 



tSv-fm) = 



(13.10.12) 



and, from the general definition of the slope 6, (13.4.5), using (13.2.12), 
(13.2.14), and the definition of L x in Table 13.8.2, 

1 

s.u 

+ (*ui -*u)^yui + (*U2— «u)Syua+ (*u3 — *u)Syua] 
-Sy^+Sy^-Sy^+S^ L x 



(13.10.13) 



tombining (13.10.12) and (13.10.13) gives 

(0u-0a)/° = 4^/3^. (13.10.14) 



Copyrighted material 



§13.10 Assays and calibration curves 311 

Furthermore, from (13.2.11) 

(*a-*u) = (*si-*ui) = log^j-log^! - log^/z™) 

(13.10.16) 

so, from (13.3.6), 

<* 8 -*u).log 10 r = log 10 (2.i/^i). (13.10.18) 

Substituting these results, together with r « D (see § 13.2), into the 
general formulas, as above, gives the potency ratio, from (13.3.7), as 

R - (^).antilog 10 (^.k)g 10 D). (13.10.17) 

Confidence limits for the population of R from (13.6.6), (with v n 
= 4/iV, v M = 1/fiV) are 

(13.10.18) 

where 

2NsH* 

9 - -Jgp- (13.10.19) 

Again if , p is small, so l - g ~ 1 , further simplification is possible. 
As explained above and in § 13.6 the confidence limits for the popula- 
tion value of \og l0 R can be found, as in § 7.4, from log 10 J2 ± ts\\og 10 R] 
where the approximate standard deviation of log 10 i?, from (13.10.18) 
or (13.6.8), can be written in the form 

A numerical example of the use of (13.10.18) and (13.10.20) is given in 
§ 13.12. 

13.11. A numerical example of a symmetrical (2 +2) dose 
parallel line essay 

The results of a symmetrical assay of (-f-)-tubocurarine based on a 
randomized block design (see § 11.6) are shown in Table 13.11.1. The 
mean responses are plotted in Fig. 13.5. The response, y, was the 
percentage inhibition, caused by each dose, of the contraction (induced 



Copyrighted material 



312 Assays and calibration curves 



§ 13.11 



by stimulation of the phrenic nerve) of the isolated rat diaphragm. 
The four doses (or 'treatments') were allotted arbitrarily to the numbers 
0, 1, 2, 3 as described in § 2.3: 

Dose 0 = LU = 0-28 ml of unknown solution, 

1 == HU = 0*32 ml of unknown solution, 

2 = HS = 16*0 fig of pure (+)-tubocurarine, 

3 == LS = 14-0 jug of pure (+)-tubocurarine. 

Each dose was given four times, so sixteen doses were given altogether. 
The doses were given in sequence to the same tissue (see § 13.1, p. 286), 
the blocks, in this case, corresponding to periods of time. The analysis 

Table 13.11.1 

Responses to (+)4ubocurarine. The doses were given in random order in 
each block (time period) as specified in the text, not in the order shown in 



the table 




Treatment 
LS HS LU HU 


Totals 


1 

Block I 

4 


43 82 41 61 
48 62 48 68 
63 66 63 70 
62 70 66 72 


207 
226 
242 
260 


Totals 


196 260 108 271 


926 



will therefore help to eliminate error due to changes of sensitivity of 
the tissue with time (which occurred in this experiment). However it 
seems most unlikely that the responses in one block (period of time) 
will differ from the responses in another by a constant amount, as 
specified in the model (eqn. (11.2.2)) on which the analysis is based 
(see §§ 11.2 and 11.6), so the analysis should be regarded as only an 
approximation. The four doses were given in strictly random order 
(see § 2.3) in each time period, the random number tables producing 
the sequence: first block: 1, 0, 2, 3; second block 3, 1, 2, 0; third block, 
0, 3, 1, 2; fourth block, 1, 0, 3, 2. 

The assumptions involved in the analysis (normal distribution of 
errors, equal scatter in all groups, size of response not affected by pre- 
vious responses, additivity etc.) have been discussed in §§11.2 and 
13.1, p. 279. The analysis is the same as that for randomized block 



Copyrighted material 



§ 13.11 



Assays and calibration curves 313 



experiments (§ 11.6), with the addition that the between treatment 
sum of squares can be split into components as described in § 13.7. 
Because this assay is symmetrical (see (13.8.1)) the arithmetic can be 
simplified using the results in §§ 13.8, 13.9 and 13.10. Remember that 




40 1 1 1 / / 1 1 

-0-8 -U-5 7 i +H +1-2 

l<»g ia (low 

Fio. 13.11.1. Result* of symmetrical 2 + 2 doee assay from Table 13.11.1. 

O Observed mean responses. 

Least squares lines constrained to be parallel (i.e. with 

mean slope, see §§ 13.4, 13.10 and calculations at end of 
this section). 

Notice break on abscissa. The question of the units for the potency ratio, 
50 61, is discussed later in this section. 

the assumptions discussed in §§ 4.2, 7.2, 11.2, and 12.2 have not been 
tested, so the results are more uncertain than they appear. 

The analysis of variance of the response (y) 

The figures in Table 13.11.1 are actually identical with those in Table 
11.6.2 which were used to illustrate the randomized block analysis. 
The lower part of the analysis of variance (Table 13.11.2) is therefore 
identical with Table 11.6.3. The calculations were described on p. 199. 
(A similar example is worked out in § 13.12.) All that remains to be 
done is to partition the between -treatment sum of squares, 1188*6875, 
using the simplifications made possible by the symmetry of the assay. 

(a) Linear regression. The linear regression contrast defined in 
(13.8.2) (or by the coefficients in Table 13.8.1) is found, using the 
response totals from Table 13.11.1, to be 

L. = -1964-260-1984-271 = 137. 



Copyrighted material 



314 Assays and calibration curves § 13.11 

The sum of squared deviations (SSD) for linear regression is found 
using eqn. (13.9.4): 

L? 137 s 
SSD = —4 = — = 1173-0625. 
n2a a 4x4 

In this expression n is the number of responses in each total used 
in the calculation of L (see (13.8.1)), and £ct s is the sum of the squares 
of the ooeffioiente of the response totals in L (given in Table 13.8.1; 
in the particular case of the (2+2) dose assay it is 4 for all 3 contrasts). 

(b) Deviations from parallelism. The deviations from parallelism 
contrast defined in (13.8.3) is 

L[ = 196-260-198+271 = 90 
The sum of squares is found, as above, from (13.9.4) 

880 " jS " £l - 8 062fl - 

(c) Between standard and unknown preparations. The contrast, 
defined (in 13.8.4), is 

L p = -196-260+198+271 = 130, 

and the sum of squares, using (13.9.4) as above, is 

13 2 

SSD = -— = 10-5625. 
4X4 

(d) Check on arithmetical accuracy. The sum of the three components 
is 1 173-0625+5-0625+ 10-5625 = 1188-6875, the same as the between 
treatments sum of squares which was calculated independently. 

These results are assembled in the analysis of variance, Table 13.1 1.2. 



Interpretation of the analysis of variance 

Dividing each mean square by the error mean square, in the usual 
way, gives the variance ratios F. As usual all the mean squares would 
be an estimate of the same variance a* if all 16 observations were ran- 
domly selected from a single population with variance a*. This is 
the basio all-embracing form of the null hypothesis because if it were 
true there would obviously be no differences between treatments, 
blocks, preparations, etc. In fact, when the variance ratio for linear 
regression, F = 319-3 with f x = 1 and / a = 9 degrees of freedom, is 



Copyrighted material 



§ 13.11 



Assays and calibration curves 315 



looked up in tables of the distribution of the variance ratio, as described 
in 1 11.3, it is found that a value of ^(1,9) as large as, or larger than, 
319-3 would be very rarely (P<0-001) observed if both 1173 0625 and 
3*674 were estimates of the same varianoe (<r*), i.e. if there were in 
fact no tendency for the high doses to give larger responses than low 
doses (see {§ 13.8 and 13.9). It is therefore preferred to reject the null 
hypothesis in favour of the alternative hypothesis that response does 

Tablb 13.11.2 

Analysis of variance of responses for symmetrical (2+2) dose assay of 
(~f) tubocurarine. The lower part of the analysis is identical with Table 
1 1.6.3 which was calculated using the same figures 



Source of variation 


d.f. 


BSD 


MS 


F 


P 


Linear regression 
Deviations from 


1 


11730626 


11730626 


319-3 


<0001 


parallelism 


1 


5 0626 


60626 


1-38 


>0-2 


Between preps. 












(8 and U) 


1 


10-6626 


10-6626 


2-87 


0-1-0-2 


Between treatments 


S 


1188-6876 


396-229 


107-86 


<0-001 


Between blocks 


3 


270-6876 


90-229 


24-66 


<0001 


Error 


9 


330626 


3-674 






Total 


15 


1492-4376 









change with dose (i.e. that 0, the population value of 6, is not zero, of. 
§ 12.5). The logioal reason for this preference was discussed in § 6.1. 

Proceeding similarly for the other variance ratios shows that devia- 
tions from parallelism such as those observed would be quite common 
if the true (population) lines were parallel. The same (or larger devia- 
tions from parallelism would be expected in more than 20 per cent of 
repeated experiments if the population lines had the same slope 
(0 B = 0 V ). There is therefore no evidence against the hypothesis of 
parallelism. 

Similarly there is little evidence that the average responses are 
different for standard and unknown. Of course it is most unlikely that 
they are exactly equipotent, but differences as large as, or larger than 
those observed would not be very uncommon if they were (see p. 93). 

There appears to be a real difference between blocks. Differences as 
large as, or larger than, those observed would be expected in less than 
1 in 1000 experiments if the population block means were equal ; of. 



Copyrighted material 



316 Assays and calibration curves 



§ 13.11 



§ 11.6. Inspection of the results reveals a tendency for the responses to 
get larger with time, and the analysis suggests that this cannot be 
attributed to experimental error. The arrangement in blocks has 
therefore helped to decrease the experimental error. 

All these inferences depend on the assumption of §§4.2, 7.2, 11.2, 
and 12.2 being sufficiently nearly true. If they were, the conclusion 
would be that there is no evidence that the assay is invalid so it is 
not unreasonable to carry on and calculate the potency ratio and its 
confidence limits. 

The potency ratio and the question of units 

The simplified result for the symmetrical (2-|-2) dose parallel line 
assay, eqn. (13.10.6), gives the least squares estimate of the potency as 

B — {—J . antilog 10 ^^log 10 l)J 

/14.0\ T13-0 i 5Q - 64 

= [^j ■ antilog 10 [— .jwtilog 10 (l-14286)J =f^yug/mJ. 



D is the ratio between high and low doses (see (13.8.1)), i.e. 
D = 0-32/0-28 = 16 0/14-0 = 1-14286, and the contrasts, L p and L x 
have been already calculated. In § 13.3 and later sections it was 
assumed that all doses were expressed in the same units. This means 
that Zls/zlui hence R, is a dimensionless ratio. In this case the 
dose of standard was given in /xg, and that of unknown in ml, so 
Zls/zlo — 14-0 ugfO-2% ml = 50-0 ug[m\. If these units are used z L8 / 
Zlu, and hence J?, will have the units ug/m\ t suggesting that, if these 
units are used, R is actually the potency (concentration in «g/ml ) of the 
unknown, rather than a potency ratio. It can easily be seen that this 
is so by converting standard and unknown to the same units. For 
example the doses of standard could be assumed to be 16-0 ml and 
14-0 ml of a 1-0 yug/ml standard solution of (+)-tubocurarine (the fact 
that they are more likely in reality, to have been 0-16 ml and 0-14 ml 
of a 100 ug(m} solution does not alter the dose given). This would give 
2 ls/*lu — 14-0 ml/0-28 ml = 50 (a dimensionless ratio). The potency 
ratio would therefore be 50-61, as above, also a dimensionless ratio. 
The concentration of the unknown is, from the definition of the 
potency ratio (13.3.1), Rx concentration of standard = 50-61 x 1-0 pgj 
ml = 50-61 i"g/ml, as found above. 



§ 13.11 



Assays and calibration curves 317 



Confidence limits for the potency ratio 

The simplified form of Fieller's theorem appropriate to this assay 
is eqn. (13.10.9), which gives confidence limits for the population 
value of the potenoy ratio as 

where g = tfaV/Lj, according to (13.10.10). 

If the doses are expressed in their original units the equation will 
give confidence limits for the concentration of unknown, rather than 
for the potenoy ratio, for exactly the reasons explained above in 
connection with the potency ratio calculation. In this example : 

— 1 4*0/0- 28 as 50-0 jug/ml as above; 
Lp/Lj = 130/1370 = 009 489; 
log iQ £ - log (0-32/0-28) =SlHtO. 05799 

*%] = 3*674, the error variance (from Table 13.11.2) with 9 degrees 
of freedom ; 

t — 2-262 for P = 0-05 (as 95 per cent confidence limits are wanted) 
with 9 d.f. (from tables of Student's t, see §§ 4.4 and 7.4); thus 

g « 16 X 3-674 x2-262 2 /137-0 a = 0 0160, from equation (13.10.10), 
and (l-g) = 1—0 016 - 0-984. 

The fact that g is considerably less than one implies that the slope, 6, 
is much larger than its standard deviation (as inferred from the large 
variance ratio for linear regression in Table 13.11.2). This means that 
it is safe to use an approximate equation, based on (2.7.16), for the 
variance of the log potency ratio (as discussed in §§ 13.5, 13.6, and 
13.10, and illustrated below). However, it is very little trouble to use 
the full equation. Substituting the above quantities into the equation 
for limits gives 

r/009489 1-917x2-262 /( \\ 
60.anti 1 og 10 L(- 0 . 9 - 84 - ± --^ r ^y((16x0.984) + 16(0.09489)^ 

X 0-05629] 

= 50 Mtt itogm [-0 -00175 7 and + 0-01242]f 

= 49-80 ug/ml and 51-45 jug/ml. 

49.79 51.52 

t If necessary, see p. 32S for a footnote describing how to find the antilog of a 
negative number. 



318 Assays and calibration curve* 



§ 13.11 



Approximate confidence limits 

Because g is muoh lees than 1 the approximate formula for the limits, 
eqn. (13.10.11), oan be used (see §§ 13.6, 13.6, and 13.10). Substituting 
the quantities already calculated into (13.10.11) gives the estimated 
standard deviation of log 10 l? as 

0-05529 

s\\og 1Q R] ~ Q vtS-eTixietl+O-OQiSQ 2 )] *m 0-003108. 

The approximate confidence limits are therefore, as in § 7.4, 

logio* ± t*Q°BioX] = log 10 50-61i (2-262 X 0 003108) « 1-6972 

and 1-7113. 

Taking antilogs gives the approximate 95 per cent Gaussian confidence 
limits for the true value of R as 49-80 and 51-44 pg/ml, not muoh 
different from the values found from the full equation, which are 
themselves, of course, only approximate as explained in § 7.2. 

Summary of the result 

There is no evidence that the assay is invalid, and the estimated 
potency of the unknown tuboourarine solution is 50-61 /ig/ml, with 
95 per cent Gaussian confidence limits 49-9 pg/ml to 51-45 /ig/ml. These 
conclusions are based on the assumptions discussed in §§ 4.2, 7.2, 11.2, 
and 12.2. The confidence limits are, as usual, likely to be too narrow 
(see § 7.2). Notice that the confidence limits for R are not equally 
spaced on each side of R, unlike the limits encountered in Chapter 7. 
In fact even the limits for log R are not equally spaced on each side of 
log R unless g is small (see §§ 13.5, 13.6, and 13.10). 

How to plot results. Conversion to convenient units 

When the results of the assay is plotted, as in Fig. 13.11.1, it will 
be preferable to plot the least squares lines. The calculated average 
slope, b, has been found using logs to base y/D (see § 13.2) so these 
must be used in plotting the graph (they can be found from logs to 
base lOusing (13.3.5)). Alternatively, if the graph is plotted with log 10 
(dose) along the abscissa, as in Fig. 13.11.1, the calculated slope must 
be converted to the correct units. In this example b = LJN = 137 0/ 
16 = 8-6625 (from eqn. (13.10.2)). 

To convert from logs to base y/D to logs to base 10, it is necessary, 
using (13.3.5), to multiply the former by log 10 \/D as in § 13.3. Because 



Copyrighted material 



§ 13.11 



Assays and calibration curves 319 



dose occurs in the denominator of the slope, b must be divided^ by 
logioVA i e. by } log 10 I> = \ log 10 (0-32/0-28) = 0 0290. The required 
slope is therefore 6' = 8-6625/0 0290 = 295-3. The dose response 
curves have the eqns. (13.3.3), 

^8 - $8 + 6'(*S-*8)> 

- $u+*'(*u-*u)» 

where x is now being used to stand for log 10 (dose), the abscissa of 
Fig. 13.11.1. The response means are, from Table 13.11.1, § B — (196 
+260)/8 = 57-0 and ft, = (198+271)/8 =- 58-626. The dose means 
have not been needed explicitly because of the simplifications resulting 
from the choice of dose metameter. For the standard, log 10 16-0 
= 1-2041 and log 10 14-0 = 11461 so f 8 = (4x 1-2041 + 4X M461)/8 
= 1-1751 (each dose occurs four times remember). Similarly \og lo z sv 
= log 10 0-32 Ba —0-4949 and log^u — log 10 0-28 = —0-5528, so i v 
= (4X -0-4949+4 X -0-5528)/8 = -0-5238. 

Substituting these results gives the lines plotted in Fig. 13.11.1 as 
Y B = 57-0+295-3(z s -M751), Y v = 58-625+295 3(x u +0-5238). 

13.12. A numerical example of a symmetrical (3 +3) dosa 
parallel line assay 

The results in Table 13.12.1, which are plotted in Fig. 13.12.1, are 
measures of the tension developed by the isolated guinea pig ileum in 
response to pure histamine (standard), and to a solution of a histamine 
preparation containing various impurities as well as an unknown 
amount of histamine. Five replicates of eaoh of the six doses were 
given, all to the same tissue, so there is a danger that one response 
may affect the size of the next, contrary to the necessary assumption 
that this does not happen (see discussion in § 13.1, p. 286). The doses 
were arranged into five random blocks. The purpose of this arrange- 
ment is the same as in § 13.11, and, as in that example, the order in 
whioh the six doses were arranged in each block was decided strictly 
at random using random number tables (see § 2.3). 

This is a symmetrical assay as defined in (13.8.1), there being n = 5 
responses at each of the k = 6 dose levels; k B = k v = 3 dose levels 
for standard and for unknown ; n B = n 0 = 15 responses for standard 

t More rigorously, the slope using log l0 (dose) is 

dy dy 1 dy 6 

d log 10 * ~ dtlog^z.logioV^) ~ log 10 V£ d log VJ) t ~ log 10V /I>' 



Copyrighted material 



320 Assays and calibration curves §13.12 

Table 13.12.1 



Responses of the isolated ileum. The doses were given in random order 
(see text) in each block (time period), not in the order shown in the table 





Stand* 


rdhistau 


Line dose 


Unknown d 


lose 






81 


82 


83 


Ul 


U2 


U3 




Block 


4 ng/ml 8 ng/ml 16 ngl/ml 


8 ng/ml 16 ng/ml 32 ng/ml 


Total 


1 


20-6 


27-0 


380 


18-5 


300 


360 


1690 


2 


18-6 


31-6 


440 


150 


24-0 


34-6 


167-6 


3 


200 


260 


35-5 


130 


260 


380 


168-5 


4 


18-0 


23-6 


41-5 


13-5 


260 


360 


167-6 


6 


200 


250 


386 


120 


26-0 


320 


162-6 


Total 


97-0 


1330 


197-5 


72-0 


1310 


174-6 


8050 




10l ' 1 1 i 

0-602 0 903 1 204 1 505 log,, dose 

2 3 4 5 log, done 

4 8 16 32 doae (ng/ml) 

(logarithmic 
snale) 

Flo. 13.12.1. Results of symmetrical 3 -j- 3 dose assay from Table 13.12.1. 

O Observed mean responses to standard. 
A Observed mean responses to unknown. 
— Least squares lines constrained to be parallel (see §§ 13.4, 
13.10 and end of this section). 

The analysis indicates that these straight lines may well not fit the observations 
adequately. The abscissa shows three equivalent ways of plotting the log dose. 
Note that the ordinate does not start at zero. 



Copyrighted material 



§ 13.12 Assays and calibration curves 321 

and for unknown. The ratio between each dose and the one below it is 
D = 2 throughout. The first stage is to perform an analysis of varianoe 
on the responses to test the assay for non-validity. As for all assays, 
this is a Gaussian analysis of varianoe, and the assumptions that- 
must be made have been discussed in §§ 4.2, 7.2, 11.2, and 12.2, whioh 
should be read. Uncertainty about the assumptions means, as usual, 
that the results are more uncertain than they appear. 

Analysis of variance of the response (y) 

The first thing to be done is, as in § 13.11, a conventional Gaussian 
analysis of varianoe for randomized blocks. Proceeding as in § 11.6, 

CP 805 a 

(1) correction factor — = -— = 21600-8333; 

N 30 

(2) sum of squares between doses (treatments), with k—1 =* 5 

degrees of freedom, from (11.4.6), 

97-0 3 133-0 3 174-5 2 

= 21600-8333 = 21790667; 

5 5 6 

(3) sum of squares between blocks, with n— 1 = 4 degrees of freedom, 

from (11.6.1), 

169-0 2 152-5 3 
= —t— +...+— — - 21600-8333 = 32 8333; 
o 6 

(4) total sum of squares, from (2.6.6) or (11.4.3), 

= £(y-p) a = 20-5 2 +18*5 2 4-.. . + 36-0 a + 320 a -21 600-8333 =* 
2328-6667; 

(5) error (or residual) sum of squares, by difference, 
= 2328-6667— (2179-06674-32-8333) = 116-7667 
with 29-5 = 6(5—1) = 24 degrees of freedom. 

These results can now be entered in the analysis of variance table, 
Table 13.12.2. The next stage is to account for the differences observed 
between the responses to the six doses, i.e. to partition the between 
doses sum of squares into components representing different sources 
of variability, as described in § 13.7. The simplified method described 
in § 13.8 can be used because the assay is symmetrical. The coefficients, 
a, for construction of the contrasts are given in Table 13.8.2. 

(a) Linear regression. From the coefficients in Table 13.8.2, and the 
response totals in Table 13.12.1, the linear regression contrast is 

L x = -97-0+ 197-5-72-0+ 174-6 = 203-0. 



Copyrighted material 



322 Assays and calibration curves § 13.12 

The corresponding sum of squares for linear regression is found, using 
(13.9.4), to be 

L\ 203 a 
SSD = — ^ = — = 2060-45. 
n£a a 5X4 

In this expression n = 5 is the number of responses at each dose level 
(i.e. in each total), and Ea a , the sum of squares of the ooemaients, is 
given in Table 13.8.2. 

(b) Deviations from parallelism. The deviations from parallelism 
oontrast, from Table 13.8.2, is 

H =■ 97-0-197-5-72-0+ 174-5 = 20. 

The corresponding sum of squares is 

L 'l 2-0 2 
SSD = -~ = -— = 0-20. 
nSa a 5X4 

(c) Between standard and unknown preparations. The contrast, from 
Table 13.8.2, is 

L 9 = -97 0-133-0-197-5+72-0+ 131-0+174-5 - -50 0. 
The sum of squares, from (13.9.4) (using S« a = 6 from Table 13.8.2), is 

L% 50-0 3 



SSD 



nZoc a 5x6 

(d) Deviations from linearity. The contrast, from Table 13.8.2, is 
L 2 = 970-(2x 1330)+ 197-5-f-720-(2xl31-0)-|- 174-5 = 130 

and the corresponding sum of squares, as before, is 

LI 13-0 2 
SSD = — ~ = — — = 2-82. 
n£a a 5X12 

(e) Difference of curvature. The contrast, from Table 13.8.2, is 
Li= -97-0+(2xl33-0)-197-5+72-0-(2xl31-0)+174-5 = -44 0, 

and the corresponding sum of squares 

(Z,') a (-44-0) 3 

SSD = = = 32-27. 

nla 2 5X12 



Copyrighted material 



§ 13.12 



Assays and calibration curves 323 



{/) Check on arithmetical accuracy. The total of the five sums of squares 
just calculated is 

2060-45+0-20+83-33+2-82+32-27 = 2179*07 

agreeing, as it Bhould, with the sum of squares between doses which 
was calculated independently above. 

All these results are now assembled in an analysis of variance table, 
Table 13.12.2, which is completed as usual (cf. §§11.6 and 13.7). 
Divide each sum of squares by its number of degrees of freedom to find 



Table 13.12.2 
The P value marked t is found from reciprocal F = 5- 838/0-2, 

see text 



Source 


d.f. 


8SD 


MS 


F 


P 


Lin regression 


1 


2060-46 


2060-45 


362-9 


<0-001 


Deviation from 










parallelism 


1 


0-20 


0-20 


0034 


0-8-0-9f 


Between S and U 


1 


83-33 


83-33 


14-27 


2*0-001 


Deviations from 












linearity 


1 


2-82 


2-82 


0-48 


>0-2 


Difference of 












curvature 


1 


32-27 


32-27 


5-63 


«M)5 


Between doeee 


5 


2179-07 


485-813 


74-66 


<0001 


Between blocks 


4 


32-83 


8-208 


1-41 


>0-2 


Error 


20 


116-77 


5-838 






Total 


20 











the mean squares. Then divide each mean square by the error mean 
square to find the variance ratios. The value of P is found from tables 
of the distribution of the variance ratio as described in § 11.3. As 
usual P is the probability of seeing a variance ratio equal to or 
greater than the observed value if the null hypothesis (that all 30 
observations were randomly selected from a single population) were 
true. 

Interpretation of the analysis of variance 

The interpretation of analyses of variance has been discussed in 
§§ 6.1, 11.3, and 11.6 and in the preceding example, § 13.11. As usual 
it is conditional on the assumptions being sufficiently nearly true, and 
must be regarded as optimistio (see §§ 7.2, 11.2, and 12.2). There is no 



Copyrighted material 



324 Assay a and calibration curves 



§ 13.12 



evidence for differences between blocks, so little or nothing was gained, 
and some degrees of freedom were lost, by using the block arrangement 
in this particular case (cf. § 13.11). The average slope of the dose 
response curves, shown in Fig. 13.12.1, is clearly not likely to be zero 
because if it were, a value of F > 352-9 would be exceedingly rare. 
The question of parallelism is interesting, especially as the standard 
and unknown were not identical substances. The variance ratio, 
^(1,20) = 0-2/5-838 = 0 034, is very small so there is no hint of 
deviations from parallelism. To find the P value for F < 1 the method 
described in § 11.3 can be used. Looking up ^(20,1) = 5-838/0-2 
= 29-2 in tables of the variance ratio gives the probability of observing 
an F value of 29-2 or larger as something between 0-1 and 0-2. Therefore 
the probability of observing ^(1,20) < 0 034 is 01— 0-2,— not so rare 
that the lines must be considered as more nearly parallel than would 
be expected on the basis of the observed experimental error. Another 
way of stating the result is that in 80-90 per cent of repeated experi- 
ments the F value for deviations from parallelism would be predicted 
to be greater than 0 034 if the population lines were parallel. 

Though neither the standard nor the unknown observations lie 
on straight lines, as seen in Fig. 13.12.1, the analysis of variance gives 
no hint of deviations from linearity. This is because the average of the 
two lines (to which the analysis refers) is very nearly straight. The 
observations lie on lines that curve in opposite directions so the curva- 
tures cancel when the slopes are averaged. In fact an F value corres- 
ponding to a difference in curvature as large as, or larger than, the 
observed one would be expected to occur, as a result of experimental 
error, in rather less than 5 per cent of repeated experiments. This 
cannot be explained further without doing more experiments. There 
could be a real difference in curvature as a result of the impurities in 
the unknown solution. In intuitive pharmacological grounds this 
does not seem very likely so perhaps there is no real difference in 
curvature and a rarish (rather less than 1 in 20) chance has come off 
(see § 6.1). More experiments would be needed to tell. 

If the possibility of a real difference in curvature were not considered 
to invalidate the assay, the potency ratio and its confidence limits 
would be calculated as follows. 

The potency ratio 

In this example the doses of both standard and unknown are ex- 
pressed in the same units (ng/ml), so the units problem discussed in 



Copyrighted material 



§ 13.12 



Assays and calibration curves 



326 



§ 1 3. 1 1 does not arise. The least squares estimate of the potency ratio, 
from (13.10.17), is 



From the definition of the potency ratio, (13.3.1), concentration of 
unknown = R x concentration of standard. The unknown preparation 
is thus estimated to contain 39-8 per cent by weight of histamine, 
assuming that the impurities in it do not interfere with the assay. 

Confidence limits for the potency ratio 

The simplified form of Fieller's theorem for the (3-f»3) dose sym- 
metrical assay is (13.10.18), which gives confidence limits for the 
population value of the potenoy ratio as 



where g = 2NsH*jZL\ according to (13.10.19). For this example 

W*ux = */ 8 = °' 5 > 

LJL X = -50/203 = -0-2463, 

logio^ = logio2 = 0-3010, 

s 2 [y] = 6-838, the error variance (from Table 13.12.2) with 20 degrees 
of freedom, 

* = V(5-838) = 2-416, 

t = 2- 086 for P = 0-06 (for 96 per cent confidence limits) and 20 d.f. 
(from tables of Student's f, see §§ 4.4 and 7.4). 

t To find the antilog of a negative number write it as the sum of a negative integer and 
a positive part between 0 and 1. Thus, to find antilog (—0 09885), write —0-09886 in 
the form —1+0-9011, which lb conventionally written 1-9011. Look up antilog 0-9011 
= 7*964, and move the decimal point one place to the left (because of the 1) giving 
antilog (-0 09886) =- 0-7964. Working from first principle*, antilog 10 (- 0 09886) 
=- 10-°°»", from the definition of logarithms, and 10-°°"" « 10--10*° 9011 - 10" 1 
antilog (0 9011). 





» 0-6 antilog 10 ( -0-09886) = 0-6x0-7964 = 0-398f 





Copyrighted material 



326 Assays and calibration curves §13.12 

Thus ? = 2 x 30 X 5- 838 X 2 086 3 /(3 X 203 a ) = 0-01 233 
and (l—g) = 0-9877. 

Ab in the last example, g ia small so the approximate formula for the 
limits could be used, but before doing this the full equation given above 
will be used to make sure that the approximation is adequate. Substitut- 
ing the above quantities into the general formula gives 

H4x( -0-2463) 
3X0-9877 ± 

2 

4 X 2-416 X 2-086 II ftx30 \\ 1 

= 0-5 antilog (—0-1561, —0-04406) = 0-349 to 0-452f. 



Approximate confidence limits 

Because g is much less than 1, the approximate formula for the 
oonfidenoe limits (see §§ 13.5, 13.6, and 13.10) can be used, as in the 
last example. Substituting into (13.10.20) gives the estimated standard 
deviation of \og l0 R as 

4x0-3010 IV / 2 Y\ 

stlog lo i?]-^^- > y[5-838x30^1+-(-0-2463) a jJ = 002669. 

The approximate 95 per cent confidence limits are therefore, as in § 7.4, 

logio* ± kP°gio*] = log 10 0-3982 ± (2-086x0-02669) 

= -0-3999 ± 0 05668 = -0-4556 and -0-3442. 

Taking antilogst gives the confidence limits as 0-360 and 0-453, similar 
to the values found from the full equation. 



Summary of the result 

The assay may have been invalid because of a difference in curvature 
between the standard and unknown logdose-response curves. If this 
difference were attributed to (a rather unlikely) chance the estimated 
potency ratio would be 0-398, with 96 per cent Gaussian confidence 
limite of 0-349 to 0-452. As usual, these confidence limit* must be re 
garded as optimistic (see § 7.2). 



t Sm footnote p. 326. 



§13.12 Assays and calibration curves 327 

The slope of the response-log dose lines, from (13.10.13), is 6 = 203/20 
= 10-15. This is the slope using x — log D (dose) (see $ 13.2). It must be 
divided by log 10 Z) — 0-3010, giving 6' = 33*72, the slope of the response 
against log 10 (dose) lines, which are plotted in Fig. 13.12.1. The full 
argument is similar to that for the 2+2 dose assay given in detail in 
§ 13.11. 



13.13. A numerical example of an unsym metrical (3 +2) dote 
parallel line 



The general method of analysis for parallel line assays, when 
the simplifications resulting from symmetry (defined in (13.8.1)) 



of 
be 



30r 



20 - Unknown 



10 




Pio. 13.13.1. 
13.13.1. 



00 

Results of an 



-fHT 



To 



log I0 done (x) 
unsymmetrical 3 + 2 dose assay from Table 



O Observed mean responses to standard. 
A Observed mean responses to unknown. 

Least squares lines constrained to be parallel (see 

$ 13.4 and this section). 

used, will be illustrated using the results shown in Table 13.13.1 and 
plotted in Fig. 13.13.1. The figures are not from a real experiment — 
in real life a symmetrical design would be preferred. Tho lfi doses 



Copyrighted material 



328 Assay 8 and calibration curves $13.13 

should be allocated strictly at random (see § 2.3) so a one way analysis 
of variance (see § 11.4) is appropriate (given the assumptions described 
in § 11.2). 



Table 13.13.1 
Results of a 3 + 2 dose assay 







^L&ndArt! d octet 




Unknown doses 




Doae (s) 
log 10 dose 
(x - log i0 *) 


10 
00 


30 

0-4771 


10-0 
10 


10 
00 


40 

00021 


Total 


Responses (y) 


9-4 
10 8 
101 


18-0 
188 
170 
18-1 


27- 7 

28- 1 
28-2 


13-6 
12-8 


25 1 
250 
240 




ft 

Mean 

Total 


3 

10-1 
30-3 


4 

18-2 
72-8 


3 
280 
840 


2 

13-2 
28-4 


3 
24-7 
74- 1 


15 








j 








Total 




187- 1 






-> ' 

100-5 


287-6 



The analysis of variance of the responses 
The one way analysis of variance is exactly as in § 11.4. 



G 2 287-6* 

(1) Correction factor — = — — = 5614-25066. 

xV 15 

(2) Total sum of squares (from (2.6.5) or (11.4.3)), with 2v"-l = 14 
degrees of freedom, 

= 9*4*+10'8 9 +--.+24-0 a --5514-25066 

= 650- 16933 

(3) Sum of squares between doses (from (11.4.5)) with 5—1 = 4 

degrees of freedom, 

30-3 3 72-8 a 74- l a 

= — — h— — h-.+— = 5514-25066 

3 4 3 

= 647-48933. 

(4) Error sum of squares, by difference, 
= 650-16933-647-48933 = 2-6800 
with 14—4 = 10 degrees of freedom. 

The next stage is to divide up the sum of squares between doses, as 
described in § 13.7. It will be convenient first to calculate various 
quantities from the results. 



Copyrighted material 



§ 13.13 Assays and calibration curves 329 

For the standard the figures in Table 13.13.1 give 
n 8 = 10, 

Z* 8 = (0x3)+(0-4771x4)+(1 0x3) = 4-9084 

(remember that each dose occurs several times; of. Table 12.6.2), 
x B sot 4-9084/10 = 0-49084, 
2y B = 30-3+ 72-8+ 840 = 187-1, 
t/ 8 = 1871/10 a 18-71, 

Z(s a -:r 8 ) 2 = (0 2 x 3)+(0-4771 2 x 4)+(l-0 a x3)-4-9084 2 /10 

= 1-50126 (from (2.6.5); again each x occurs several 
times), 

£y 8 (*8-*8) = (0x9-4)+(0xl0-8)+...+(l-0x28-2) - 
(4.9084 X 187-1)/10 
= 26-89672 (found from (2.6.9) and (12.2.9), as in 
(12.6.1)). 

Similarly, for the unknown preparation, 
n v = 5, 

Zz v = (0x2)+(0-6021x3) = 1-80630, 
z v = 1-80630/5 = 0-36126, 
Xy v = 26-4+74-1 = 100-5, 
ft, = 100-5/5 = 2010, 

£(*u-*u) a = (0 2 x2)-r-(0-6021 a x3)-l-8063 2 /5 - 0-43503, 
Syu^u-^u) = (0x26-4)H-(0-6021x74-l)-(l-8063xl00-5)/6 
= 8-30898. 

Now these results can be used to find the components of the sum of 
squares between doses, as described in § 13.7. 

(1) Linear regression, from (13.7.1), 

(26-89672+ 8-30898) 2 

SSD = ! — ~ = 640-111405. 

1-50126+0-43503 

(2) Deviations from parallelism, from (13.7.2), 

_ 26-S9672 3 8-30898 2 

SSD = -— - — + — — — - - 640- 11 1 405 = 0-472584. 
1-50126 0-43503 

(3) Between standard and unknown, from (13.7.3), 

187- l a 100-5 2 

SSD = — — — | — -5514-25066 = 6-4403. 

10 5 



Copyrighted material 



330 Assays and calibration curves §13.13 
(4) Deviations from linearity, by difference, 

SSD = 647-4S933-64O-1U4O5 -0-472584- 6-4403 
= 0-465045. 

These figures can now all be filled into the analysis of variance table 
(Table 13.13.2), whioh has the form of Table 13.7.1. 



Table 13.13.2 



Source of variation 


&.1 


SSD 


MS 


F 


P 


TltMtaj' regression 


1 


640 111 


640 11 


2388 


<0001 


Deviations from 










parallelism 


1 


0-473 


0-478 


1-76 


>0-2 


Between 8 and U 


1 


6-440 


6-440 


2403 


<0 001 


Deviations from linearity 1 


0-466 


0-466 


1-74 


>0-2 




4 


647-489 


181-872 


604 0 


<0001 


Error^tMn*doses) 


10 


2-680 


0-268 






Total 


14 


660160 









The interpretation of the analysis is just as in §§ 13.11 and 13.12. 
There is no evidence of invalidity, though if the responses to standard 
and unknown had been more nearly the same it would have increased 
precision slightly (see $ 13.6). 

Plotting the results 
The average slope of the dose response lines, from (13.4.5), is 

26-89672+8-30898 

6 = ! = 18-18. 

1-50126+0-43503 

The slopes of lines fitted separately to standard and unknown would be 
6 8 = 26-89672/1-60126 = 17-92, and b v = 8-30898/0-43503 = 19 10. 
The lines plotted in Fig. 13.13.1 are therefore, from (13.3.3), 

7 8 =r 18-71 + 18- 18(* 8 -0-4908), 
7 0 « 20-10+18-18(s 0 -0-3613). 

This calculation, but not the preceeding ones, has been made rather 
simpler than in §§ 13.11 and 13.12, because there is no simplifying 
transformation to bother about. 



Copyrighted material 



$13.13 Assays and calibration curves 331 

The potency ratio 

From (13.3.7), the potency ratio is estimated to be (because z = log 10 
dose) 

r (20-10-18-71)~| 
R = antilogJ (0-49084-0-36126)-|-- -J 

= antilog 10 (0-2060) = 1-607. 



Confidence limits for the potency ratio 
Using the quantities already found, 

^[y] = 0-2680 (the error mean square with 10 d.f. from Table 13.13.2), 

t = <v/(0-2680) = 0-5177, 

(0u-y B )/& = (20-10-18-71)/18-18 = 0-076458, 

IZ(*-xf = 1-50126+0-43503 = 1-9363, 
s.u 

t = 2-228 for P = 0-95 limits and 10 d.f. (from tables of Student's t ; 
see §§4.4 and 7.4). 

Thus, from (13.6.6), 

2-228 a X 0-2680 
9 = 18- 18 2 x 1-9363 = 000208 



so (1-0) = 0-9979. 

Logs to the base 10 have been used, so the conversion factor 
log 10 10 = 1. The 95 per cent confidence limits for the population value 
of R are therefore, from the general formula (13.6.6), 

T 0-076458 
antilog 10 [(0-49084-0-36126)+ ± 



0-5177x2 



18-18x0-997 



•228 /f /l 1\ 0-076458 a )*l 



= antilog 10 (0-2062 ± 0-03496) 
= 1-484 and 1-743. 

Approximate confidence limits 
Because g is small (even smaller than in the last two examples), 



Copyrighted material 



332 Assays and calibration curves §13.13 

the approximate formula, its general form, can be used. In this case 
M = logio-K 80, using (13.6.8), 

0-268/1 1 0-076458 a \ 
= 2-4570 xlO" 4 . 

The confidence limits for log 10 fi are therefore log! Q R ± t\/(va,r[\og 10 R]), 
and V(var[log 10 ^]) = -^(2-457) x 10" a = 0 015675, giving the limits 
as 0-2060±2-228x 0 015675 = 0-2060 ±0-03492 = 0-1711 and 0-2409. 
Taking antilogs gives the approximate confidence limits as 1-483 and 
1-742. 

Summary of (he result 

The assay is not demonstrably invalid. The potency ratio is estimated 
to be 1-607, with 95 per cent Gaussian confidence limits of 1-484 to 
1-743. The analysis depends on the assumptions discussed in §§4.2, 
7.2, 11.2, and 12.2 and, as usual, the confidence limits are likely to ba 
too narrow (see § 7.2). 

13.14. A numerical example of the standard curve (or calibration 
curve). Error of a value of x read off from the line 

In Chapter 12 the method for estimation of the error of a value of Y 
(the dependent variable) read from the fitted line at a given value of z 
was described. In § 12.4 it was mentioned that the reverse problem, 



Table 13.14.1 





X 


Obe- 


srvati 


ions {y) 


Total 


n 


Mean 




1 


2.3 


1.7 




4.0 


2 


2.0 




2 


6.4 


4.7 


4.9 


160 


3 


60 


standard 


3 


7-4 


6-6 




140 


2 


70 




4 


9-7 


8-9 


8-4 


270 


3 


90 


Unknown 




81 


8-6 




16-6 


2 


8-3 



estimation of the error of a value of x interpolated from the fitted line 
for a given (observed or hypothetical )value of y, is more complicated. 
In fact the method is closely related to that used to estimate confidence 
limits for the potency ratio, and an example will now be worked out. 
The results in Table 13.14.1, which are plotted in Fig. 13.14.1, are 



Copyrighted material 



§ 13.14 



Assays and calibration curves 333 



results of the sort that are obtained when measurements are made from 
a standard calibration curve. This method is often used for chemical 
assays. For example x could be concentration of solute, and y the 
optical density of the solution measured in a spectrophotometer. 
In this example x can be any independent variable (see § 12.1), or any 
transformation of the measured variable, as long as y (the dependent 



lUr 



y 




Flo. 13.14.1. The standard calibration curve plotted from the result* in 
Table 13.14.1. 

O Observed mean responses to standard. 

Pitted least squares straight line (see text). 

96 per cent Gaussian confidence limits for the population 

(true) line, i.e. for the population value of y at any 

given x value (see text). 
96 per cent Gaussian confidence limits for the mean of 

two new observations on y at any given x value (see text). 

The graphical meaning of the confidence limits for the value of 2 corresponding 
to the value of y observed for the unknown is illustrated. 

variable) is linearly related to x (unlike most of the rest of this chapter, 
in which the disoussion has been confined to parallel line assays in which 
x ss log dose). It is quite possible to deal with non-linear calibration 
curves using polynomials (see § 12.7 and Goulden (1952)) but the 
straight line case only will be dealt with here. 



Copyrighted material 



334 Assays and calibration curves §13.14 

Frequently the standard curve is determined first and it is assumed, 
as in this seotion, that it has stayed constant during the subsequent 
period in which measurements are made on the unknowns. This 
requires separate verification, and it would obviously be better if 
standards and unknowns were given in random order or in random blocks. 
If this is done the unknowns can be incorporated in the analysis of 
variance as described in § 13.15, the effect of this being to reduce the 
risk of bias and to improve slightly the estimate of error by taking 
into account the scatter of replicate observations on the unknown. It 
will of course be an assumption that the scatter of responses is the same 
for all of the standards and for the unknowns, in addition to the other 
assumptions of the Gaussian analysis of variance which have been 
described in §§11.2 and 12.2. 

The straight line and its analysis of variance 

First a straight line is fitted to the results for the standard. The 
method has already been described in § 12.6, so only the bare bones 
of the calculations will be given here. The basic design is a one way 
classification with k B = 4 independent groups (see § 11.4). 

(1) Correction factor 
n B = 10 

% 8 == 4-0+ 15 0+ 14 0+ 27-0 60 0, 

60-0 a 

correction factor = — — - = 360-0. 

10 

(2) Total sum of squares, from (2.6.5) (cf.(l 1.4.3)) 

= 2-3 a +l-7 a +...+8-4 a -360-0 = 65-6200. 

(3) Sum of squares between groups, from (11.4,5), 

4 a 15 a 14 a 27 a 

= -+—+—+- 360-0 = 64-0000. 

2 3 2 3 

Although the x values are equally spaced, the simplifying trans- 
formation described at the end of § 12.6 cannot be used, because the 
number of observations is not the same at each x value. 

(a) Sum of squares due to linear regression. First calculate 
£x 8 = (Ix2) + (2x3)+(3x2)+(4x3) = 260 
and x B = 26 0/10 = 2-60. 



Copyrighted material 



§13.14 Assays and calibration curves 335 

The sum of products, from (2.6.7) (see § 12.6), is 
2y B (*B-*B) = (lx4.0) + (2xl5-0) + (3xl4-0)+(4x27'0)- 

26-0 X 600 

—To 2800 ' 

The aum of squares for x is, from (2.6.5), 

260 2 

Z(* 8 -i 8 ) a = (I a x2)+(2 2 x3)+(3 a x2)+(4'x3)-— 
- 12-40. 

Thus, sum of squares due to linear regression, from (12.3.4), 

280 2 
— — - = 63-2258. 
12-4 

(b) Sum of squares for deviations from linearity. By difference 

SSD = 64-0000-63-2258 = 0-7742. 

(4) Sum of squares for error (within groups sum of squares). By 
difference 

SSD = 65-6200 -64-0000 - 1-6200. 

These results can now be entered in the analysis of variance table, 
Table 13.14.2. 

Table 13.14.2 



Source 


d.f. 


SSD 


MS 


F 


P 


Lin. regression 
Dev. from linearity 


1 
2 


63-2258 
0-7742 


63-2258 
0-3871 


234 
1-43 


<0-001 
>0-2 


Between groups (* values) 
Error (within groups) 


3 
6 


640000 
1-6200 


21-3333 
0-2700 


790 




Total 


9 


65-6200 









The interpretation is as in § 12.6. There is strong evidence that y 
increases with x. If the true line were straight then an F value for 
deviations from linearity equal to or greater than 1-43 would be expected 
in more than 20 per cent {P > 0-2) of repeated experiments (given the 
assumptions — see § 11.2 and 12.2), so there is no reason to believe the 
*i 



Copyrighted material 



336 Assays and calibration curves § 13.14 

true line is not straight. However this analysis does not distinguish 
between systematic and unsystematic deviations from linearity. 
Looking at Fig. 13.14.1 suggests the deviations in this case, though no 
larger than would be expected on the basis of experimental error, are of 
a systematic sort. The line appears to be flattening out. Now physical 
considerations, and past experience suggest that this is just the sort of 
nonlinearity that would be expected in a plot of, say, optical density 
against concentration. In a case like this it would be rather rash to fit 
a straight line, in spite of the fact that there are no grounds for rejecting 
the null hypothesis that the true (population) line is straight. This is a 
good example of the practical importance of the logical fact explained in 
§ 6. 1, that if there are no grounds for rejecting a hypothesis this does not 
mean that there are good grounds for accepting it. In a small experiment, 
such as this, with substantial experimental errors, it is more than 
likely that deviations from linearity that are real, and large enough to be 
of practical importance, would not be detected with any certainty. 
The verdict is not proven (see § 6.1). For purposes of illustration, a 
straight line will now be fitted, though the foregoing remarks suggest that a 
polynomial (see above) would be safer. The least squares estimates of the 
parameters (see § 12.2) are thus, from (12.2.6), 

£y B 600 

a a = v a = — = = 600 

8 ys n B 10 



and, from (12.2.8), 

. 2y 8 (*s-*s) 28-00 
&8 = - 1^5 = 2 ' 2581 ' 

so the fitted line is 



Y B = a a +& s (z 8 -x 8 ) - 6-00+2-2581 (ar 8 -2-60) (13.14.1) 
= 0-1289+2-2581 x B 

and this is the straight line plotted in Fig. 13.14.1. 

Interpolation of the unknown 

The mean of the two observations {n v = 2) on the unknown, from 
Table 13.14.1, is § v = 8-30. The equation for the standard line, (13.14.1), 
is Y = a+b(x—x), and rearranging this to find x gives 

Zss£m+ { I^. (13.14.2) 
o 8 



Copyrighted material 



§ 13.14 



Assays and calibration curves 337 



The estimate of x v (e.g. concentration) corresponding to the mean 
observation $ v (e.g. optical density) on the unknown is therefore, from 
(13.14.1) and (13.14.2), 



as shown graphically in Fig. 13.14.1. 

Gaussian confidence limits for the interpolated x value 

The approach is exactly like that in § 13.6. In (13.14.3) x B is an 
accurately measured constant. If the observations are normally 
distributed (see §4.2), then yu—a — § v — f/ g will be a normally dis- 
tributed variable, and so will the slope of the standard line, 6 8 (see 
§ 12.4, especially (12.4.1)). Therefore (fo— #s)/&s = m » De tne 

ratio of two normally distributed variables and Fieller's theorem (see 
§ 13.5) can be used, as in § 13.6, to find confidence limits for its true 
value. If the error mean square from Table 13.14.2 (* a = 0-2700) is 
taken as the variance of the observations on the unknown as well as 
the variance of the observations on the standard then, from (2.7.9), 
var(y 8 ) = s 2 fn B , var(y v ) = s^jn^. Because the observations on standard 
and unknown are assumed independent, it follows from (2.7.3) that 
VBT(y v —y B ) — var(yu)+var(y B ) = ^(l/nu+l/ng^andso from (13.6.3), 



*n = (l/»u+l/»s)> 

Hi = l/2(* 8 -i 8 ) a (from (12.4.2)), 

v l3 = 0 (as in § 13.6). 

In the present example 

s 2 = 0-2700 with 6 d.f. (from (13.14.2)), 
s = ViO-2700) = 0-6196, 

t = 2-447 for P = 0-96 and 6 d.f. (from tables of Student's <; see 



*u = x B -\-(y v -y B )ib B 

= 2-60+(8-30-6-00)/2-2681 
= 3-619, 



(13.14.3) 



§ 4.4), 



(yu-y 9 ) (8-30-6-00) 



= 1-01866, 



m = 



6 2-2681 



0 = 



«V* aa 2-4 47 2 X 0-2700 
b 2 ~ 2-2581 a x 12-40 



= 0-02657 (from (13.6.8)), 



{l-g) = 0-9744. 



Copyrighted material 



338 Assays and calibration curves §13.14 

The 95 per cent confidence limits for the true value of z v therefore 
follow from (13.5.9) (by adding x g to the confidence limit* for {jf v —j/ a )lb ; 
of. § 13.6) and are 

i.e in the present case 

1-01866 0-5196x2-447 IT (\ 1 \ 1-01856 2 "! 
2 ' 60+ 0-9744 ± 2-2581x0-9744V L°' 97 \2 + l0/ + 12-40 J 

= 3-173 and 4118. 

Because g is fairly small, similar limits would have been found by 
using the approximate formula, from (13.5.12), var(x u ) ~ s 3 (v 11 
+ m 3 t' 22 )/fr J . The limits are not symmetrical about x v = 3-619 unless g is 
negligibly small (or unless y v = y B ), as discussed in § 13.5. In this 
case the limits expressed as percentage deviations from 3-619 are 
—12-3 per cent to +13-8 per cent. The graphical meaning of the limits 
is discussed below. 



Summary of the result 

The unknown value of x corresponding to y v — 8-3 is x v = 3-619 
with 95 per cent Gaussian confidence limits from 3-173 to 4-118. These 
results depend on the assumptions described in §§ 7.2, 11.2, and 12.2 
and, as usual, must be considered to be optimistio (see § 7.2). 



Confidence limits for the population calibration line 

Assuming the true line to be straight, limits for its position can be 
calculated as described in § 12.4. Another example was worked out in 
§ 12.5. In this case var(y) = 0*2700, the error mean square with 
6 d.f. from Tables 13.14.2, N = 10 (the number of observations used 
to fit the line), t = 2-447 as above, x B = 2-60 and E(x-£ ) a = 12-40 as 
above. Using these values var(y), and hence the confidence limits, 
oan be calculated at enough values of x to plot the limits, which are 
shown as dot-dashed lines in Fig. 13.14.1. Two representative calcula- 
tions follow. 

(1) Atx= 1-0. At this point the estimated value of y is, from (13.14.1), 
Y = 0 1289-f (2-2581 X 1-0) = 2-387 



Copyrighted material 



§ 13.14 Assays and calibration cwrvts 339 

and, from (12.4.4), 

/ 1 (l-0-2-60) a \ 
var(7) = 0-270O^-+ V — J - 0-082742. 

The 95 per cent Gaussian confidence limits for the population value of 
Y atx= 10 are therefore, from (12.4.5), 2-387 ± 2-447 x V^ 08 *™ 2 
= 1-683 and 3 091. 

(2) At * - 2-0, Y = 0-1289+(2-2581x2 0) = 4-645, 

/ 1 (2-0-2-60) a \ 
var(7) - 0-2700( I5 4- 1 ^ ) - 0-034839, 

giving oonfidenoe limite of 4-645 ± (2-447 x V0 034839) = 4-188 
and 5- 102. 

Confidence limits for the mean of two new observations at a given x. The 
graphical meaning of Fietter's theorem. 

In § 12.4 a method was described for finding limite within which new 
observations on y (rather than the population value of y), at a given x, 
would be expected to lie. In the present example there are n v = 2 
new observations on the unknown. Using eqn (12.4.6) with m = 2, 
N = 10, and the other values as above, these limits can be calculated 
for enough values oi x for them to be plotted. They are shown as 
dashed lines in Fig. 13.14.1. Two representative calculations, using 
(12.4.6), follow. 

(1) At x = 1-0. At this point Y — 2-387 as above The 96 per cent 
confidence limits for the mean of two new observations are, from 
(12 4.6), 

2-387 ± 2>447y[o»270o{^^+ <I O ~^ /] = 1'245 and 3-529. 

(2) At x = 2-0, Y = 4-645 as above, and the limits are, from 
(12.4.6), 

/f (I 1 (2-0-2-60) a n 
4-645 ± 2-447^ [0-2700|-+j+ I2 , 4Q )j =» 3-637 and 6 653. 

These limits are seen to be wider than the limits for the population 
value of Y as would be expected when the uncertainty in the new 
observations is taken into account. They are also less strongly curved. 



Copyrighted material 



340 Assays and calibration curves 



§ 13.14 



The mean of the two observations on the unknown in Table 13.14.1, 
was t/u = 8-3, and the corresponding value of x v read off from the line 
was 3-619 as calculated above, and as shown in Fig. 13.14.1. The 
95 per cent confidence limits for x v at y = 8-3 were found above to be 
3-173 to 41 18. It can be seen in Fig. 13.14.1 that these are the points 
where the line for y = 8-3 intersects the confidence limits just cal- 
culated (the limits for the mean of two new observations at a given x). 
The limits found from Fieller's theorem (13.5.9) are, in general, the 
same as those found graphically via (12.4.6). 

13.15. The (*+1) dose assay and rapid routine assays 

In this section the Jfc 8 +1 dose parallel line assay will be illustrated 
using the same results that were used to illustrate the calibration 
curve analysis in § 13.14. 

Routine assays 

The (2+2) or (3+3) dose assays should be preferred for accurate 
assays. The (& s +l) dose assay probably occurs must frequently in 
the form of the (2+1) dose assay in which the unknown is interpolated 
between 2 standards. This is the fastest method and is often used when 
large number of unknowns have to be assayed. It is rare in practice 
for the doses to be arranged randomly, or in random blocks of 3 doses 
(jfc 8 +l doses in general). Even worse, standard and unknowns are often 
given alternately, so each standard is used to interpolate both the 
unknown immediately before it and the unknown immediately after it. 
This introduces correlation between duplicate estimates, making the 
estimation of error difficult. Quite often the samples to be assayed will 
come from an experiment in which replicate samples were obtained, 
and several assays will be done on each of the replicate samples. In 
this case a reasonable compromise between speed and statistical 
purity is to do (2+ 1) dose assays with alternate standard and unknown, 
and to interpolate each unknown response between the standard 
responses (one high and one low) on each side of it. The replicate assays 
on each sample are then simply averaged. An estimate of error can 
then be obtained from the scatter of the average assay figures for 
replicate samples rather than doing the calculations described below. 
The treatments should have been applied in random order (see § 2.3) 
in the original experiment and the samples should be assayed in random 
order. If the ratio between the high and low standard doses is small 
(say less than 2) it will usually be sufficiently accurate to interpolate 



Copyrighted material 



§ 13.15 



Assay a and calibration curves 341 



linearly (rather than logarithmically) between the standards. See 
Colquhoun and Tattersall (1969) for further discussion. 

A numerical example of a (4+ 1) dose parallel line assay 

In a parallel line assay x — log (dose) by definition (see § 13.1), 
unlike § 13.14 in which x could have been any independent variable. 
In biological assays it is usual to specify the dose of unknown (e.g. in 
ml or g of impure solid) and to compute a potency ratio R (see § 13.3), 



10r 



6- 



4 - 



2- 



0 




r—log,,*- 1-619-^ 



3 3-619 
log,, doae (x) 



Flo. 13.15.1. If * in § 13.14 (Table 13.14.1) were log doee, then the results 
in Table 13.14.1 could be treated as a 4 + 1 dose parallel line assay, as illustrated, 
as an alternative to the treatment as a standard curve problem which was worked 
out in § 13.14. The observations and fitted line are as in Fig. 13.14.1 with the 
addition that the dose of unknown required to produce the unknown reaponijes 

has been specified. 

rather than to interpolate the unknown response on the standard curve 
as in § 13.14. (Equation (13.14.3) together with (13.3.7) is seen to 
imply log R = 0, i.e. R = l f which simply means that a given dose, in 
terms of the active substance, gives the same response whether it is 
labelled standard or unknown.) Suppose, for example, that x in Table 
13.14.1 represents the log 10 of the standard dose (measured in ml) in a 
(4+1) dose parallel line assay. Suppose further that a log 10 (dose 
in ml) of unknown, z = 2-0, is administered twice and produces 



Copyrighted material 



342 Assays and calibration curvet 



§ 13.15 



responses y v = 8-1 and 8*5, so $ v = 8-3 as in Table 13.14.1. This 
assay is plotted in Fig. 13.16.1. Using the general formula for the 
log potency ratio, (13.3.7), gives, using (13.14.3), 



Taking antilogs gives R = 41-59 which means that 41-59 ml of standard 
must be given to produce the same effect as I ml of unknown. 

It was mentioned in § 13.14 that it is dangerous to determine the 
standard curve first and then to measure the unknowns later unless 
there is very good reason to believe that the standard curve does not 
change with time. It is preferable to do the standards and unknowns 
(all 12 measurements in Table 13.14.1) in random order (or in random 
block of 5 measurements, cf. §§ 13.11 and 13.12). If this had been done 
the analysis of variance would follow the lines described in § 13.7, 
exoept that there can obviously be no test of parallelism with only one 
dose of standard (a 2+ 1 dose assay would have no test of deviations 
from linearity either). There are now 5 groups and 12 observations. The 
total, between group and error sum of squares are found in the usual way 
(see §§11.4, 13.13, or 13.14) from the 5 groups of observations in Table 
13.14.1. The results are shown in Table 13.15.1. The between groups sum 
of squares can be Bplit up into components using the general formulae 
(13.7.1H13.7.3). In a {k B +l) dose assay there is only one unknown 
dose so x v = £u, i.e. [x v —x v ) = 0 so the expressions for the slope 
(13.4.5) and the sum of squares for linear regression, (13.7.1), reduce 
to those used already in § 13.14, which are entered in Table 13.12.1 
(it is only common sense that the observations on the unknown can 
give no information on the slope of the log dose-response line). The 
sum of squares for differences in responses to standard and unknown, 
from (13.7.3), is 



When this is entered in Table 13. 15.1 the sum of squares for deviations 
from linearity can be found by difference. It is seen to be identical 
with that in Table 13.14.2, as expected. 

The error variance in Table 13.15.1 is 0-2429, less than the figure of 
0-2700 from Table 13.14.2. Inclusion of the unknown responses has 



= 3-619-2-00 = 1-619. 



60 s 16-6* 76-G 2 
~10 + 2 12~ 



= 8-8167. 



Copyrighted material 



§ 13.15 



Assays and calibration curves 343 



slightly reduced the estimate of error because they are in relatively 
good agreement. The interpretation is the same as in § 13.14. 

The confidence limits for the log potency ratio be found from the 
general parallel line assay formula, (13.6.6). The calculation is, with 



Table 13.16.1 



Source d.f. 


SSD 


MS 


F 


P 


linear regression 1 


832258 


63-2258 


260 


<0 001 


Bet. stcL and unknown 1 


8-8167 


8-8167 


36-3 


<0 001 


Deviations from linearity 2 


0-7742 


0-3871 


1-60 


>0-2 


Between doses 4 
Error (within doses) 7 


72-8167 
1-7000 


18-2042 
0-2429 


74-9 


<0 001 



Total 11 76-6167 



any luck, seen to be exactly the same as in § 13.14 except x v «= 2-00 
is subtracted from the result. The limits are therefore 3-173 — 2-00 
= 1173 and 4-118— 2 00 = 2 1 18. Taking antilogB gives the OSperoent 
Gaussian confidence limits for the true value of R (estimated as 
41-89) as 14-89 to 131-2 — not a very good assay. 



Copyrighted material 



14. The individual effective dose, 
direct assays, all-or-nothing 
responses and the probit 
transformation 



14.1. The individual effective dose and direct assays 

The quantity of, for example, a drug needed to just produce any 
specified response (e.g. convulsions or heart failure) in an animal is 
referred to as the individual effective dose (IED) for that animal and 
will be denoted z. More generally, the amount or concentration of any 
treatment needed to just produce any specified effect on a test object 
can be treated as an IED. A standard preparation of a drug, and a 
preparation of the same drug of unknown concentration, can be used 
to estimate the unknown concentration. This sort of biological assay is 
usually referred to as a direct assay. 

A group of animals is divided randomly (see § 2.3) into two sub- 
groups. On each animal (test object, in general) of one group the IED 
of a standard solution of the substance to be assayed is measured. The 
IED of the unknown solution is measured on each animal of the 
other group. 

It is important to notice that in this case the dose is the variable not 
the response as was the case in Chapter 13. 

If the doses of both solutions are measured in the same units (see 
§ 13.11) then the dose (z ml, say) needed for a given amount (in mg, 
say) of substanoe is inversely proportional to the concentration of the 
solution. The object of the assay is to find the potency ratio (R) of the 
solutions, i.e. the ratio of their concentrations. Thus 

concentration of unknown 



^ concentration ol standard 

Dooulation mean IED of standard 

(14.1.1) 



population mean IED of unknown 
In practice the population meanf IEDs must, of course, be replaced 

t See Appendix 1. 



Copyrighted material 



§14.1 



Probits 



345 



by sample estimates, the average, z, of the observed IEDs. The question 
immediately arises as to what sort of average should be used. 

If the IEDs were normally distributed there are theoretical reasons 
(see §§ 2.5, 4.5, and 7.1) for preferring to calculate the arithmetic 
mean IED for each preparation (standard and unknown). In this case 
the estimated potency ratio would be R = zjz v . Because the IED has 
been supposed to be a normally distributed variable, this is the ratio 
of two normally distributed variables. A pooled estimate of the variance 
^[z] could be found from the scatter within groups (as in § 9.4). The 
confidence limits for R could then be found from Fieller's theorem, 
eqn. (13.5.9), with v n = l/n a and v 22 = 1/ny, where n a and n v are the 
numbers of observations in each group. (Because each IED is supposed 
to be independent of the others, v ia = 0.) 

However, if the IEDs are lognormally distributed (see § 4.5) then 
the problem is simpler. Tests of normality are discussed in § 4.6. 

Use of the logarithmic dose scale for direct assays 

In those cases in which it has been investigated it has often been 
found that the logarithm of the IED {x = log 2, say) is normally 
distributed (i.e. 2 is lognormally distributed, see § 4.5). It therefore 
tends to be assumed that this will always be so, though, as usual, there 
is no evidence one way or the other in most cases. If it were so then it 
would be appropriate to take the logarithm of each observation and 
carry out the calculations on the x = log 2 values, because they will be 
normally distributed. (In parallel line assays a logarithmic scale is 
used for the dose, which is the independent variable and has no distribu- 
tion, for a completely different reason; to make the dose-response 
curve straight. See §§ 11.2, 12.2, and 13.1, p. 283.) 

Taking logarithms of both sides of (14.1.1) gives the log of the 
potency ratio (M, say) as 



If the log IED is denoted x = log 2 then it follows that the estimated 
log potency ratio will be 



The variance of this will, because the estimates of IED have been 
assumed to be independent, be 




= log (IED of S)-Iog (IED of U). 



M = log R = x B —x v . 



(14.1.2) 



Copyrighted material 



346 Probfta § 14.1 

var(x fl ) var(a;») 
var(if) = var(£ 8 )+var(s 0 ) = — — + — — , (14.1.3) 

from (2.7.3) and (2.7.8). It is neoessary, as in § 9.4, to assume that the 
scatter of the measurements (x values) is the same in both groups so a 
pooled estimate of var(z) is calculated from the scatter of the logs of 
the observations within groups as in § 9.4, and used as the best estimate 
of both var(x 8 ) and vax(x v ). The confidence limits for the log potency 
ratio are then M ± f \/{var(.flf )} as in § 7.4. Taking antilogarithms of 
these, and of (14.1.2), gives the estimates of R and its oonfidenoe limits. 
A numerical example is given by Burn, Finney, and Goodwin (1950, 
pp. 44-8). 

14.2. The relation between the individual effective dose and 
all-or-nothing (quantal) responses 

In the sort of experiment described in § 14.1 the individual effective 
dose (IED) just sufficient to produce a given effect is measured directly 
on each individual. For example, the amount of digitalis solution needed 
to produce cardiac arrest can be measured on each of a group of animals 
by giving it as a slow intravenous infusion and observing the volume 
administered at the point when the heart stops. The results given in 
Table 14.2.1 are an idealized version of experimental measurements 
of 100 individual lethal doses (z) of cocaine cited by J. W. T re van 
(1927). The results have been grouped so that a histogram can be 
plotted from them and the percentage of individual effective doses 
falling in each dose interval is denoted /. The logarithms (x) of the 
doses are also given (1 has been added to each of the values to make 
them all positive). 

From the results in Table 14.2.1 the mean individual effective dose is 
the total of the z values divided by the total number of observations! 

Z/z 51-475 
z = — — — — — — ~0-515 mg. 

£/ 100 6 

The median effective dose (dose for p = 50 per cent) I (u.2.1) 

(interpolated from Fig. 14.2.2) ~ 0-49 mg 
The modal effective dose (interpolated from Fig. 

14.2.1) ~ 0-44 mg. 



t This memo, is calculated from the grouped results , each IED being assumed to have 
the central value of the group in which it falls. If the original ungrouped observations 
were available, the mean of these would be preferred. If it is accepted that t is lognormsi 
(see below) then the mean can also be estimated using the equation on p. 78 with 
ft -7.707 and a - 0104 from Fig. U.2.6. This gives antilog 10 (1.707 + 1 1813 X 
0-1 04») = 0-524 mg. 



Copyrighted material 



Probits 347 



A histogram of the distribution of the individual effective doses is 
plotted in Fig. 14.2.1 and the estimated mean, median, and modal 
IEDs (see § 2.5) plotted on it. The distribution looks positively skewed 
and therefore, as expected, mean > median > mode (see § 4.5). 

Table 14.2.1 

Frequency f = percentage of animals responding in each dose interval. 
Cumulative frequency p = total percentage of animals responding to dose 
equal to or less than the upper limit of each dose interval. Probits (see 
§ 14.3) were obtained from Fisher and Yates tables (1963, Table JX, 
p. 69). The p values are found as the cumulative sum of the observed f 

values. For example 54 = 38-}- 15+1 



Doee 
interval 
(mg of cocaine) 


Mid-point 
<*) 


log dose 
interval +1 
(*» 


/ 


p 


f* 


Probit 
of 
V 


0-0-2 


0-10 


— oo -0-301 


0 


0 


0 


— 00 


0-2-0-3 


0-26 


0-301-0-477 


1 


1 


0-25 


2-674 


0-3-0-4 


0-36 


0-477-0-602 


16 


16 


6-26 


4005 


0-4-0-6 


0-46 


0-602-0-699 


38 


64 


1711 


5-100 


0-6-0-6 


0-56 


0-690-0-778 


26 


79 


18-76 


6-806 


0-6-0-7 


0-66 


0-778-0-846 


11 


90 


7-15 


6-282 


0-7-O-8 


0-75 


0-846-0-903 


6-5 


96-6 


4-88 


6-812 


0-8-0-9 


0-86 


0 003-0-064 


2-6 


99 


2- 126 


7-326 


0-0-1-0 


006 


0-964-1-000 


1 


100 


0-96 


+ 00 








sy 

= 100 




Z/* = 
61-476 





The individual effective dose is of course a continuous variable 
and the distribution of IEDs is a continuous distribution (see § 4.1). 
However, in order to get an idea of the shape of the distribution it has 
been necessary to group the observed IEDs so that the histogram in 
Fig. 14.2.1 can be plotted. A continuous line has been drawn by eye 
through the histogram as an estimate of what the actual continuous 
distribution should look like. 

In Fig. 14.2.2 the histogram of cumulative frequency (p) is plotted 
against dose. When a continuous line is drawn through the top right- 
hand corner (see below) of each block an unsymmetrioal sigmoid curve 
is obtained. This is the cumulative distribution, or distribution func- 
tion, F{z), (denned in (4.1.4)) corresponding to the distribution of IED 
shown in Fig. 14.2.1. That is to say the ordinate of the curve in Fig. 
14.2.2 for any specified value of the dose, z, is equal to the area under 



Copyrighted material 



348 Probits 



§14.2 



the curve in Fig. 14.2.1 below z (cf. Fig. 5.1.2 and its cumulative form, 
Fig. 5.1.1, and Figs. 4.1.3 and 4.1.4). 

The relation between the IED and quantal responses can now be 
illustrated. When a quantal response is obtained the IED itself is not 
measured. A fixed dose, z, of drug is given to a group of n subjects 
and the number, r, of subjects showing the chosen response is observed. 
The proportion of subjects responding in the group is r/n. This is, of 
course, a discontinuous variable; and if the same dose were given 



Mode 0-44 mg 

Median ~ 0-49 mg 
Arithmetic mean ^= 0-52 mg 




04 



05 0-6 
2^ dose in mg) 



0-8 



0-9 



10 



Fio. 14.2.1. Histogram of the individual effective dose measurement in 
Table 14.2.1. The frequency, /, is plotted against dose (2). The continuous line 
has been drawn by eye through the histogram as an estimate of the true (con- 
tinuous) distribution of individual effective doses. The distribution is skew so 
median effective dose (shaded area = 50 per cent of total area under curve) is 
less than the mean but greater than the modal effective dose (see § 4.5). 



repeatedly to many groups of n subjects then the number showing a 
response, r, would be expected to vary from trial to trial according to 
the discontinuous binomial distribution (see §§ 3-2-3.4). The subjects 
that respond will be those in the group whose IED is equal to or less than 
the dose given, z. Therefore if the doses chosen were the upper limits 
of each interval in Table 14.2.1 (i.e. doses of 0-2, 0-3, 0-4, . . 1*0 mg) 
then the values of r/n observed in each of the 9 groups of animals 
would be the same (apart from experimental error) as the values of p 
in the Ibin he table (which is why p and probit [p] are plotted against the upper 
limits of each dose interval in Fig. 14.2.2, 14.2.4, 14.2.5, and 14.2.6). 



§14.2 



Probita 349 




2 (dose mg) 



Fig. 14.2.2. Results from Table 14.2.1. The histogram is plotted using the 
cumulative frequency p, against dose z. The blocks, each of height /. from Fig. 
14.2.1, have been put above each other so that the total height is p. The sigmoid 
curve has been drawn by eye through the top right-hand corner of each block 
(see text) as an estimate of the true (continuous) cumulative distribution (i.e. 
the distribution function, see § 4.1) of individual effective doses, i.e. the ordinate 
is the percentage of animals with an individual effective dose equal to or less 

than z. 



Copyrighted material 



350 Probits 



{14.2 



Metn, median, and modal IBD 




log dcme (I + log i or I +x) 




FiO. 14.2.3. Results from Table 14.2.1. Histogram of individual effective 
dose measurements with dose on a logarithmic scale, rather than on an arith- 
metic scale as in Fig. 14.2.1 (1 has been added to the logs to avoid negative 
values). Now that the blocks of the histogram are not of equal width, their area 
is no longer proportional to their height, so a convention must be adopted as to 
whether area or height shall represent frequency. 

(a) In this figure height represents frequency i.e. frequency (left-hand scale) 
is plotted against log dose. The heights of the blocks are as in Fig. 14.2.1. The 
continuous curve Is a Gaussian (normal) distribution, calculated using the mean 
of the log individual effective doses (1*707), and the standard deviation of the 
log IED (0*104), estimated from Fig. 14.2.6 as described in the text. The prob- 
ability density (right-haod) scale has oeen chosen to make the areas under 



Copyrighted material 



§14.2 



Probiis 351 



So if the quanta! responses (values of r/n) were plotted against the 
dose an unsymmetrioal sigmoid dose-response curve like the continuous 
line in Fig. 14.2.2 would be expected. 

Thus when quantal responses are measured the dose is fixed by 
the experimenter and the number (or proportion) of subjects responding 
is the variable measured. On the other hand, in direct assays the dose 
is not fixed but is the variable quantity measured by the experimenter. 

The subjects responding in the quantal experiment are the subjects 
in the group with an IED equal to or less than the fixed dose given. No 
information is obtained about IEDs of a single animals so Fig. 14.2.1 
cannot be plotted directly (though it can of course be obtained by 
plotting the slope of the quantal dose-response curve, Fig. 14.2.2, 
against dose, i.e. by differentiation of Fig. 14.2.2 (this was shown in 
(4.1.5)). 

The cumulative curve in Fig. 14.2.2 is analogous to an ordinary 
dose-response curve, for example the tension (a continuous variable) 
developed by a smooth muscle preparation in response to various 
doses of histamine. Because it is easier to handle a straight line than a 
ourve, it is usual to look for ways of converting dose-response curves to 
straight lines. A method of doing this that often works in the case of 



histogram and the Gaussian curve equal, but the two are still not comparable 
because area represents frequency for the continuous curve (see $4.1), but not 
for the histogram. 

(b) The histogram in this figure has been constructed so that the area of each 
block represents frequency, /, or, more precisely, the proportion //£/ «= //100 
in this example. The area is the height (A say) times the width of the log dose 
interval (Ax say). For example, the first and last blocks represent a frequency of 
/ — 1 per cent (see Table IS. 2.1) so the first and last blocks are of equal height in 
Figs. 14.2.1 and 14.2.3(a). However, in Fig. 14.2.3(b) they have equal areas (each 
have 1 per cent of the total area), and therefore unequal heights. By definition, 
proportion — //100 = h Ajc area. For example, for the first block Ax = 0-477 
-0-301 = 0-176, so the height (probability density) is A = //100Ax = 1/17*6 
- 0-06682, as plotted. For the last block Ax - 10 -0-054 = 0 046, so A = 
//100Ax = 1/4-6 = 0-2174 ss plotted. 

The area convention shown in Fig. 14.2.3(b) is the preferable one, because it 
shows the shape of the distribution correctly when the widths of the groups are 
not equal (though only at the expense of wiping it not obvious when frequencies 
are equal, because it is more difficult to judge relative areas than relative heights 
by eye). The continuous curve is a Gaussian curve with the same mean and 
standard deviation as in Fig. 14.2.3(a), and it can now be compared directly 
with the histogram because both have been plotted using the same (area) con- 
vention (see § 4.1), and both have a total area of 10. The Gaussian curve is seen 

to fit the observations reasonably welL 

«4 



Copyrighted material 



352 Probits 



§U.2 




log doee (1+log 2) 

Flo. 14.2.4. Results from Table 14.2.1. Cumulative frequency, p, plotted 
against doee (x = log z). This figure is related to Fig. 14.2.3 in the same way as 
Fig. 14.2.2 is reUted to Fig. 14.2.1. 

The blocks (observations) are each the height of the / values in Table 14.2.1, 
and they are put above each other so the total height gives the p value from 
Table 14.2.1. The blocks are the same as those in Fig. 14.2.3(a), and the height 
of each block is proportional to the area of each block (i.e. the frequency, /) in 
Fig. 14.2.3(b). 

The continuous curve is an estimate of the true (continuous) cumulative 
distribution (i.e. the distribution function, see § 4.1) of log IED values. In other 
words, the ordinate Is the percentage of animals with a log IED equal to or less 
than z. The continuous curve in this figure is related to that in Fig. 14.2.3(b) 
in exactly the same way as the blocks are related; it is a calculated Gaussian 
distribution function (see {4.1 and text) — the ordinate is the area to the left of z. 
under the calculated Gaussian distribution in Fig. 14.2.3(b), just as the ordinate 
for the blocks is the total area of the blocks below x under the histogram in 
Fig. 14.2.3(b). The calculated Gaussian function fits quite well (the continuous 
curve in Fig. 14.2.2 fits exactly only because it was drawn through the observations 

by eye). 



Copyrighted material 



§14.2 



Probtis 353 



99 4 




0-4 05 0-8 
z(doee in mg) 

Fio. 14.2.6. Results from Table 14.2.1. Plot of the probit of p against the 
dose (*). The corresponding percentage scale is shown on the right for comparison. 
The non-linearity indicates that IED values are not normally distributed. A 
smooth curve has been drawn through the points by eye and the median IED 
(p = 50 per cent, probit [p] = 5) is estimated to be 0-49 mg, as was also found 
by interpolation in Fig. 14.2.2 (cf. Fig. 14.2.6, which gives a slightly different 

estimate). 

quantal responses is discussed in § 14.3, and illustrated in Fig. 14.2.3- 
14.2.6, which show various manipulations of the original results. 



14.3. The probit transformation. Linearization of the quantal 
dose response curve 

When dealing with continuously variable responses it is common 
practice to plot the response against the logarithm of the dose (a; 
= log z, say) in the hope that this will produce a reasonably straight 
dose response line. If p (from Table 14.2.1) is plotted against the log 



Copyrighted material 



364 




§ 14.3 



Median IED 
= antilog„T-707 
=0-509 rag 



05 0-6 0-7 0-8 
log dose (l+log 10 2) 



0-9 1-0 



Fio. 14.2.6. Results from Table 14.2.1. Plot of the problt of p against log 
dose (x = log *). The graph is reasonably straight, indicating that log IED 
values are approximately normally distributed (i.e. IED values are approximately 
lognorm&l, see § 4.6). The reciprocal of the slope (1/9-60 — 0*104) estimates the 
standard deviation of the normal distribution of log IED values, and the dose 
corresponding to p = 50 per cent (probitfp] = 6), i.e. antilog 1-707 «= 0-509 mg, 
estimates the median ( = mean = mode) of the distribution of log IED values. 
The distribution plotted with this mean and standard deviation is shown in 
Fig. 14.2.3. The estimate of the median effective dose from this plot, 0-509 mg, 
1b different from that obtained from Fig. 14.2.2 and 14.2.6 (0-49 mg). This is 
because a straight line has been drawn in this figure, using all the points; and 
the dose corresponding to probit[p] = 5 has been interpolated from the straight 
line even though it does not go exactly through the points. This would be the 
best procedure if the true line were in fact straight {i.e. if the population of log 
IED values were in fact Gaussian). In Figs. 14.2.2 and 14.2.6, curvea were 
drawn by eye to go exactly through all the points, so effectively on the observa- 
tions on each side of probitfp] = 5 were being used for interpolation of the 
median, whereas when a straight line (or other specified function) is fitted, all the 
observations are taken into account. In a real quanta! 



Copyrighted material 



§ H.3 



Probits 355 



dose, the result, shown in Fig. 14.2.4, is not a straight line but a 
symmetrical sigmoid ourve. (In fact similar results are often observed 
with continuous responses also.) 

A way of converting the results to a straight line is suggested by 
Fig. 14.2.3, in which / rather than p is plotted against log dose. The 
histogram has become roughly symmetrical compared with the skewed 
distribution of IEDs seen in Fig. 14.2.1. The continuous line in Fig. 
14.2.3 is a calculated normal (Gaussian) distribution with a mean 
and standard deviation estimated as described below and illustrated 
in Fig. 14.2.6. The calculated normal distribution is seen to fit the 
observed histogram quite well suggesting that the logarithms of the 
IEDs (values of x = log z) are normally distributed, i.e. that the 
IEDs (values of z) are lognormally distributed (see § 4.5). Any ourve 
can be linearized if the mathematical formula describing it is known. 
The sigmoid curve in Fig. 14.2.4, the cumulative form of the distribu- 
tion in Fig. 14.2.3, is a cumulative normal distribution. This was 
illustrated in Fig. 4.1.4, which shows the cumulative form, p m F{x) t 
of the normal distribution in Fig. 4.1.3. If the abscissa in Fig. 4.1.4 is 
some measure of the effective dose then the ordinate of the cumulative 
normal distribution is 

p = F(z) = area under normal curve below x 

= proportion of animals for which IED < x, (14.3.1) 

i.e. exactly what is plotted as the ordinate in Fig. 14.2.2 and 14.2.4. 

The formula for the integral normal curve shown in Fig. 14.1.4, 
from (4.1.4) and (4.2.1), is 

p " ,{x) = /* ^sr^r^i^ < 14 - 3 - 2 ' 

This curve can be transformed to a straight line if, instead of plotting 
p against x, the abscissa corresponding to p is read off from a standard 
normal curve (see § 4.3) and this is plotted against x. For example, 
if a dose of x = 3 produces an effect in 1 6 per cent of a group of animals, 
the abscissa (viz. u — —I, see § 4.3) of the standard normal curve 
corresponding to an area in the lower tail of the curve of 16 per cent 



line would be fitted to the point* in this figure using the iterative method dis- 
cussed in § 14.4. In this example, the quantal data has been generated, for 
illustrative purposes, using actual IED measurements rather than by giving 
fixed doses to groups of animals, so the best that can be done is to fit an un- 
weighted straight line (shown) as described in § 12.5. 



Copyrighted material 



386 Probits § 14.3 

would be read off as shown in Fig. 14.3.1. This value of the abscissa 
would then be plotted against the dose (or some transformation of it, 
such as the logarithm of the dose), as shown in Fig. 14.3.2. 

The abscissa of the standard normal curve, is, as described in § 4.3, 
u == {x—fi)ja t where a is the standard deviation of x (i.e. of the log 
IED in the present case). So in effect, instead of plotting p against x, 
the value of u corresponding to p (which is called the normal equivalent 
deviation or NED) is plotted against x. But because the relation between 
u and x, 

— '-ir - (;)-(;)• (U - 3 - 3 ' 

has the form of the general equation for a straight line u = bx+a t the 
plot of NED against x will be a straight line with slope l/<r and intercept 
(— p/o) if, and only if, the values of z are normally distributed. This 
is because the NED corresponding to be observed p were read from a 
normal distribution curve. 

The values of u are negative for p < CO per cent response and so, 
to avoid the inconvenience of handling negative values, 5-0 is added to 
all values of the NED and the result is called the probit corresponding 
to p or probit [p]. Tables of the probit transformation are given, for 
example, by Fisher and Yates (1963, Table IX, p. 68). From Fig. 
14.3.1, it is seen that p — 50 per cent response corresponds to u ^ NED 
= 0, i.e. probit [50 per cent] = 5. Thus 

probit [p] = tt+fl = NED +5 = 6+ 




(14.3.4) 



so the plot of probit [p] against x will be a straight line (if x is Gaussian) 
with slope l/ff (as above) and intercept (5— fifo). Here, as above, a is the 
standard deviation of the distribution of x, i.e. of the log IED in the 
present case. It is therefore a measure of the heterogeneity of the 
subjects (see § 14.4 also). 

From Fig. 14.3.1, it can be seen that the NED of a 16 per cent 
response (i.e. 16 per oent of individuals affected) is —1 or, in other 
words, the probit of a 16 per cent response is +4. This follows from the 
fact (see § 4.3) that about 68 per cent of the area under a normal 
distribution curve is within ±<r (i.e. within ±1 on a standard normal 



Copyrighted material 



§ 14.3 ProbiU 357 




(-H + 5) 

Fio. 14.3.1. Standard Gaussian (normal) distribution (see Chapter 4). 
Sixteen per cent of individuals responding corresponds to a value of u of — 1 

(the NED), i.e. to a probit of 4. 



lr +3 



ft, 

o 
■»» 

i: 



6 



+ 2 2, 




Fio. 14.3.2. If the dose (or transformed dose, e.g. log dose) x = 3 caused 
16 per cent of individuals to respond, the probit of 16 per cent, Le. 4-0, from 
Fig. 14.3.1, would be plotted against* = 3. See complete plots in Figs. 14.2.5 and 

14.2.6. 



Copyrighted material 



358 Probit* 



§14.4 



ourve), and of the remaining 32 per cent of the area, 16 per cent is 
below u = — 1 and 16 per cent is above + !• 

In Fig. 14.2.5 the probit of the percentage response is plotted against 
the dose z. The curve is not straight, implying that individual effective 
doses do not follow the Gaussian distribution in the animal population. 
This has already been inferred by inspection of the distribution shown 
in Fig. 14.2.1 which is clearly skew. However, in the usual quantal 
response experiment the distribution in Fig. 14.2. 1 is not itself observed. 
The directly observed results are of the form shown in Fig. 14.2.2; and 
it is not immediately obvious from Fig. 14.2.2 that individual effective 
doses are not normally distributed. When the probit of the percentage 
response is plotted against log dose, in Fig. 14.2.6, the line is seen to be 
approximately straight, showing that (in this particular instance) the 
logarithms of the individual effective doses are approximately normally 
distributed (cf. Fig. 14.2.3). 

The use of this line is discussed in the next section. 

« 

14.4. Probit curves. Estimation of the median effective dose and 
quantal assays 

The probit transformation described in § 14.3 can be used to estimate 
the median effective dose or concentration; that is, the dose (or con- 
centration) estimated to produce the effect in 60 per cent of the popula- 
tion of individuals (see § 2.5). This dose is referred to as the ED50 
(or, if the effect happens to be death, as the LD50, the median lethal 
dose). If the individual effective doses (IEDs) are measured on a 
scale on which they are normally distributed the ED50 will be the 
same as the mean effective dose (e.g. in § 14.3 the median effective log 
dose is the same as the mean effective log dose, see § 4.6 and Figs. 14.2.1 
and 14.2.3). 

The procedure is to measure the proportion (p — rjn) of individuals 
showing the effect in response to each of a range of doses. The propor- 
tions are converted to probits and plotted against the dose or some 
function (usually the logarithm) of the dose. If the curve does not 
deviate from a straight line by more than is reasonable on the basis of 
experimental error (see below) the ED50 can be read from the fitted 
straight line. From Fig. 14.2.6 it can be seen that the graphical estimate 
of the ED60 is 0-609 mg, the antilog of the log dose corresponding to a 
probit of 6 as explained in § 14.3. 

Furthermore, the slope of Fig. 14.2.6 is an estimate of 1/<t, according 
to (14.3.4); so the reciprocal of this slope (i.e. 1/9-60 ~ 0*104 log 10 



Copyrighted material 



§14.4 



Probit* 359 



units, from Fig. 14.2.6) is an estimate of a, the standard deviation of 
the distribution of the log IED, and this value of a was used to plot 
the distribution in Fig. 14.2.3. This standard deviation is a measure of 
the variability of the individuals, i.e. of the extent to which they do not 
all have the same individual effective dose. 

Assays of the sort described in Chapter 13 can also be done using 
quantal responses, and if a log dose scale is used they will be parallel 
line assays (see § 13.1). 

In all the applications discussed the problem arises of how to fit the 
'best' line to the observed points. Methods for doing this have been 
described in Chapters 12 and 13 but they all assume that the scatter of 
the observations is the same at every value of z, i.e. that the results 
are homosoedastic (see §§ 12.2 and 13.1). This is not the case for probit 
plots (see § 14.6 for an exception) and this complicates the process of 
curve fitting. Numerical examples of the methods are given by Burn, 
Finney, and Goodwin (1960, p. 114), and Finney (1964, Chapters 
17-21). 

The reason for the heterosoedasticity is not difficult to see. The 
number of individuals (r) responding, out of a randomly selected (notice 
that random selection is, as usual, essential for the analysis) group of 
n, should follow the binomial distribution (§§ 3.2-3.4), and the variance 
of the proportion responding, p = r/n, would be estimated from 
(3.4.5) to be var[p] = p{\—p)jn. Because the line is to be fitted to the 
plot of probit{j)](= y, say) against dose metameter, it is the variance of 
y — probit[p] that is of interest. From (2.7.13) it is seen that var[y] 
en var[p].(dy/dp) a = {dyjdp) 2 .p(l — p)jn. Now the standard normal 
curve, in Fig. 14.3.1 can be written (by (4.1.1)) as dp = f dy, and thus 
dyjdp = I If, where / is the ordinate of the standard normal curve 
(the probability density, see § 4. 1 and (4.2.1) ; / was used with a different 
meaning in § § 14.2 and 14.3). This result follows, slightly more rigorously 
from (14.3.1) and (4.1.6). Therefore var[y] ~ jp(l -p)jnp, and this is 
not a constant but varies with p. The probit plot is therefore hetero- 
scedastic and each probit (y value) must be given a weight l/var[y] 
— nPlPi 1 —P) wnen fitting the dose response lines (cf. §§ 2.6 and 13.4) ; 
it is this that gives rise to the complications. When a line is fitted it will 
lead to a better estimate of the y corresponding to each x, and hence to 
better estimates of the weights and hence to a better fitting line. The 
calculation is therefore iterative. 

It is because of the existence of this theoretical estimate (cf. § 3.7) 
of var[y] that the deviations from linearity of Fig. 14.2.6 can be tested 



Copyrighted material 



360 Probito § 14.4 

even though there is only one observation {y value) at each z value 
(of. §§ 12.6 and 12.6). 

If the weight is plotted against p (Fisher and Yates (1963, p. 71) 
give a table oipjp(\—p)), it is found to have a maximum value when 
p — 0-6, i.e. 50 per cent response rate. This is the reason why the 
ED50 is calculated as a measure of effectiveness. It is the quantity that 
can be determined most precisely. 

The minimum effective dose 

This term is fairly obviously meaningless! ^ it stands (unless the 
IED is the same for all individuals). The larger the sample the larger 
the chance that it will contain a very susceptible individual (from the 
lower tail of Fig. 14.2.3) so the lower the estimate of the minimum 
effective dose will be. Clearly it is necessary to specify the proportion of 
individuals affected in the population. It was 50 per cent in the discus- 
sion above. 

Unfortunately, it is often not of interest to known the ED50. If 
one were interested in the proportion of individuals suffering harmful 
radiation effects from the fall-out nuclear explosions it is (or should 
be) only of secondary interest to known what dose of radiation will 
harm 50 per cent of the population. What is required is an estimate of 
the dose of radiation that will not harm anyone. No answer other than 
zero dose is consistent with the lognormal distribution of individual 
effective radiation doses usually assumed because the normal distribu- 
tion of the log IED is asymptotic to the dose axis (see § 4.2), zero 
effect being produced only by log dose = — oo, i.e. zero dose. The 
question is not compatible with a normal distribution of doses either, 
as this would imply the existence of negative doses. This is a very real 
problem because when dealing with a very large population a very small 
proportion harmed means a very large number of people harmed. Suppose 
that it is decided that the EDO- 01 shall be estimated, i.e. the dose 
affecting 01 per cent of the population (about 0-0001x3500 million 
= 350 000 people on a world scale !). The weight, / 2 /p(l —p), correspond- 
ing to p — 0 0001 (i.e. probit [p] ~ 1.3) is seen from the tables to be 
0-00167, compared with 0-6366 (about 380 times larger) for p = 0-5. 
Thus to estimate the ED0-01 with the same precision as the ED50, the 

t This necessitates the abandonment of the conventional definition of the unit of 
beauty (Marlowe 1604), viz. one miliiHelen — that quantity of beauty just suffioieot 
to launch a single ship. An alternative definition could follow the lines of the purity in 
heart index discussed in f 7.8. 



Copyrighted material 



§ 14.4 



Probit* 361 



sample size (n) would have to be much bigger. And this is not the 
only problem. Working with small proportions means working with 
the tails of the distribution where assumptions about its form are least 
reliable. For example, the straight line in Fig. 14.2.6 might be extra- 
polated — a very hazardous process, as was shown in § 12.5, Fig. 12.5.1. 

14.5. Use of the probit transformation to linearize other sorts 
of sigmoid curve 

The probit transformation can be tried quite empirically in attempt- 
ing to linearize any sigmoid curve if the ordinate can be expressed as a 
proportion (see § 14.6). If this is done the variability will not usually be 
binomial in origin so the method discussed in § 14.4 cannot be used, 
and curves should not be fitted by the methods described in books on 
quautal bioassay. It would be necessary to have several observations at 
each x value to test for deviations from linearity, and the assumptions 
discussed in § 12.2 must be tested empirically. 

An example is provided by the osmotio lysis of red blood cells by 
dilute salt solutions. It is often found that the plot of the probit of the 
proportion (p) of cells not haemolysed against salt concentration 
(not log concentration) is straight over a large part of its length. 
This implies (see § 14.3) that the concentration of salt just sufficient to 
prevent lysis of individual cells (the IED) is approximately normally 
distributed in the population of cells with a standard deviation esti- 
mated, by (14.3.4), as the reciprocal of the slope of the plot. In this 
sort of experiment each test would usually be done on a very large 
number (n) of cells bo the variability expected from the binomial 
distribution, p(l— p)}n, would be very small. However, in this case 
most of the variability (random scatter of observations about the 
probit — concentration line) would not be binomial in origin but would 
be the result of factors that do not enter into the sort of animal experi- 
ment described in § 14.3, such as variability of n from sample to sample, 
and errors in counting the number of cells showing the specified response 
(not lysed). 

14.6. Log its and other transformations. Relationship with the 
Michaelis-Menten hyperbola 

The use of the probit transformation for linearizing quantal dose 
response curves is, in real life, completely empirical. The probit of the 
proportion responding is plotted against some function (x, say) of the 
dose, and the plot tested for deviations from linearity. However, there 



Copyrighted material 



362 Probita 



§14.6 



are many other curves that closely resemble the sigmoid cumulative 
normal curve in Figs. 4.1.4 and 14.2.4. One example is the logistic 
defined! by 

1 



P = 



1+e 



-<a+t»x) 



(14.6.1) 



This plotted in Fig. 14.6.1, curve (b), and is seen to be very like the 
cumulative normal curve. If the relation between p and x was repre- 
sented by (14.6.1) then it could be linearized by plotting logit(j>] 




z (ordinary arithmetic scale) 



2 



8 

H- 



16 
-r- 



32 
-+- 



04 
— f- 



:(log scaled 



128 256 512 1024 

H 1 1 1 

7 8 9 10 a- = log,2 



1 2 3 4 5 6 
0 3 0 6 0-9 1-2 15 1-8 2 1 2-4 2-7 3 0 x = log 10 2 
0 6931-39 2 08 2-77 3-47 4 18 4-85 5-55 6-24 6-93 x = log,2 



different 
wayM of 
plotting 
ar= log z 



Fig. 14.6.1. Curve (a) Plot of p against z from eqn. (14.6.3). When 6=1 
this curve is part of a hyperbola. 

Curve (b) Plot of p against x from eqn. (14.6.1). This curve is the same as 
curve (a) with t plotted on a logarithmic scale (three equivalent ways of plotting 
x = log z are shown). It is a logistic curve and can be linearized by plotting 
logitfp] against x. 

The particular values used to plot the graphs were K m 100, 6 = 1. 



f An equivalent definition of the logistic curve that may be encountered is 

*-i[ ,+ ^( 5 T : )] 

where the hyperbolic tangent ia defined by tanh</> = (e 3 * — l)/(e**+ 1). In terms of this 
definition logitfp] = 2tanh- 1 (2f> — 1) = a+bx. 



Copyrighted material 



§ 14.6 Probita 363 

(instead of probit) against x, where logitfp] is defined as log.{p/( 1 - p)}. 
This follows from (14.6.1) which implies 

logittp] = log.fy^) = k*(j^Si) - a+bx (14.8.1) 



which is a straight line with slope b and intercept a. (Remember that, 
in general, log,e* = x because the log is defined as the power to which 
base must be raised to give argument. This implies, also, that e z can be 
written, in general as antilogy, which is used below in deriving 
(14.6.3).) The use of this and other transformations for analysing 
quantal response experiments is described by Finney (1964). The 
probit and logit transformations are too similar for it to be possible 
to detect which fits the results better, with quantal experiments of the 
usual size. 

The logit transformation is also a linearizing transformation for 
the hyperbola discussed in § 12.8, and plotted in Fig. 14.6.1, curve 
(a). In this application the response, y, is a continuous variable, not a 
quantal variable. The linearity follows by taking z = log^z (using z 
to represent dose, or concentration, rather than x which was used for 
this purpose in § 12.8), and p = y/y mftX , i.e. y expressed as a proportion 
of its maximum possible value, the value approaohed as z becomes 
very large (in § 12.8 y m%x was called V). If the constant, a, is redefined 
as —log.iT then putting these quantities into (14.6.1) gives 

= V _ * _ 1 
Vm*x l+e l< **- ftl0ir * i-|_ e u * ff " 1< *** 

1 1 



which is the hyperbola (12.8.1), in the special case when 6 = 1. As 
mentioned in § 12.8, this is the Michaelis-Menten equation of bio- 
chemistry (when b = 1). The more general form, (14.6.3), has been 
used, for example, in biochemistry and pharmacology (the Hill equa- 
tion). The plot of logit [p] against log z is known as the Hill plot. The 
use and physical interpretation of the Hill plot are discussed by Rang 
and Colquhoun (1973). 



Copyrighted material 



364 Probits 



§U.6 



Summarizing these arguments, if the response y, plotted against 
dose or concentration z, follows (14.6.3) (which in the special case 
6 = 1 is the hyperbola plotted in Fig. 14.6.1, curve (a), and in Fig. 
12.8.1), then the response plotted against log concentration, x = log z, 
will be a sigmoid logistic curve defined by (14.6.1) and plotted in Fig. 
14.6.1, curve (b). And logit [y/y m «jj plotted against x will be a straight 
line with intercept a = —log K, and slope b. Quite empirically, 
equations like (14.6.3) are often found to represent dose-response 
curves in pharmacology reasonably well (the extent to which this 
justifies physical models is discussed by Rang and Colquhoun 
(1973)), so plots of response against log dose are sigmoid like Fig. 
14.6.1, curve (b). The oentral portion of this sigmoid curve is sufficiently 
nearly straight to be not grossly incompatible with the assumption, 
made in most of Chapter 13, that response is linearly related to log 
dose. 

It is worth noticing that the sigmoid plot of y against x in Figs. 
14.2.4 or 4.1.4 (the cumulative normal curve, linearized by plotting 
probity] against x) looks very like the sigmoid plot of y against x in 
Fig. 14.6.1, curve (b) (the logistic curve, linearized by plotting logit 
[p] against x). However, if x is log 2, then the corresponding plots of y 
against 2 (e.g. response against dose, rather than log dose) are quite 
distinct. The corresponding plots are, respectively, that in Fig. 14.2.2 
(the cumulative lognormal distribution, see §4.5), which has an 
obvious 'foot*, it flattens off at low 2 values; and the hyperbola in 
Fig. 14.6.1, curve (a), which rises straight from the origin with no 
trace of a 'foot* or threshold'. This distinction is effectively concealed 
when a logarithmic scale is used for the abscissa (e.g. dose). 

In order to use the logit transformation for continuously variable 
responses it is necessary to have an estimate of the maximum response, 
y m%x . This introduces statistical complications (see, for example, 
Finney (1964, pp. 69-70)). A simple solution is not to bother with 
linearizing transformations except as a convenient method for pre- 
liminary assessment and display of results, but to estimate the para- 
meters y m „, K, and b directly by the method of least squares as des- 
cribed in §12.8. 



Copyrighted material 



Appendix 1 

Expectation, variance, and non-experimental bias 



The object of this appendix is to provide a brief account of some rather 
more mathematical ideas which, although they are not necessary for following 
the main body of the book, will be useful to anyone wanting to go further. 
Also some of the following results will be useful in Appendix 2. Further 
explanation will be found, for example, in Brownlee (1966, pp. 51, 57, and 
87), Mood and Graybill (1963, p. 103), or Kendall and Stuart (1963, Chapter 
2). All the ideas discussed in this section require that the distribution of the 
variable be specified. 

A1.1. Expectation — the population mean 

The population mean value of a variable is called its expectation and is 
defined asf 

E(ar) = 2* for discontinuous distributions, (Al.1.1) 

all x 




x f(x) dx for continuous distributions. ( Al .1 .2) 



This can be regarded as the arithmetic mean of an indefinitely large number 
of observations on the variable x, the distribution of which is specified by 
the probability P(x) (discontinuous), or the probability density f{x) (con- 
tinuous), as explained in §§ 3.1 and 4.1. The reasonableness of the definition 
(Al .1 . 1 ) is obvious if a large but finite number of observations, A r , is considered. 
On the average proportion P(x) of the observations will have the value x, so the 
number of observations with the value * will be / = NP(x) and the total of 
the / observations will be fx. The total of all N observations will be 
and their mean will therefore be l>fxjN, which is exactly eqn. (Al.1.1) if 
ffN is substituted for P. The form for continuous variables, (Al.1.2), is just 
the same as (Al.1.1) except that the P is replaced by dP — f{x) dx (from 
(4.1.1)), and consequently summation is replaced by integration. 

As a numerical example take the binomial distribution with n = 3 and 
&{B) = 0-9 from Table 3.2.2 and Fig. 3.2.4. From (Al.1.1) 

E(r)=2rP(r) = (0x(MX)l)+(l x0«027)+(2 x 0-243) +(3x0-729) = 2-7, 

r-0 

t If x, considered as a random variable, is denoted 2 to distinguish it from x considered 
as an algebraic symbol (as in (4.1.4) and § A 2.6). the definition of expectation can be 
written in the preferable form E{&) - E*P(»), etc. 



Copyrighted material 



366 Appendix I §A1.1 

whioh is n& (= 3x0*9 = 2-7), the population mean value of r (number 
of successes in three trials), as mentioned in § 3.4. Notice that in this case 
the mean value is never actually observed. All observations must be integers. 

Several properties follow directly from the definition of expectation. 
For example, for a linear function, where a and b are constants, 

E[o+ 6x] = E[a]+E[6x] = a+6E[x] (Al.1.3) 
and also, more generally, 

E C?x*'] = i?i (E[a?,]) ™ g6neral (Al.1.4) 

= NE[x] if all x„ have the same mean. (All 5) 
But for a nonlinear function, g(x) say.f 

Efo(x)] # rtE[*]) f (Al.1.6) 

so averaging a function of x will not give the same answer as averaging 
x first and then finding the function of the average (cf. (2.5.4) ; the arithmetic 
mean of log x is not the log of the arithmetic mean of x, but the log geometric 
mean). See also (A 1.2. 2). 

Mean of the Poisson distribution 

It was stated in §§3*5 and 5*1 that m in (3.5.1) was the mean of the 
Poisson distribution. This follows, using (AI.1.1) and (3.5.1), giving 

= e— |o+i«(l+~+^+...)] = e~~.rn.e~ (Al.1.7) 
= m 

Mean of the normal distribution 

Using (Al.1.2) the statement that the parameter u in (4.2.1) can be inter- 
preted as the mean of the normal distribution can be justified. From (Al .1.2), 

E(x) = J x/(x)dx = J" [n+{x-u)]f{x) dx 

= uj y(x)dx+ J "jx-fi)f(x) dx 

= u+0 = p (Al.1.8) 
t The expectation of a function of * is denned in (Al.1.16) and (Al.1.17). 



Copyrighted material 



§ Al.l Appendix 1 367 

because the first integral is the area under the whole distribution curve, 
i.e. 1. The second integral is zero because, using (4.2.1) for the density, /(x), 
of the normal distribution and putting y = (x— /i) 2 so that dy « 2(x— p) dx, 
it becomes 

= -vk[ e - tt - >w ]!.= o - < Ail - 9 » 

Mean and median of the exponential distribution 

The exponential distribution of intervals between random events was 
introduced in Chapter 5 and is discussed in more detail in Appendix 2. It 
was denned in (5.1.3) by the probability density 

/(x) = Ae"*forx>0, . 
/(x) = 0 forx<0, (A1.I.1UJ 

which is plotted in Fig. 5.1.2. It was argued in §5.1 that the population 
mean interval between events must be A" 1 . This follows from (A1.12) which 
gives 



E(x) = j* x.Ae-^dx. 

The lower limit can be taken as 0 rather than — oo because, from (Al.l. 10), 
fix), and hence the integral, is 0 for x < 0. This can be evaluated using inte- 
gration by parte. See, for example, Thompson (1965, p. 188), or Massey and 
Kestelman (1964, pp. 332 and 402).) Putting « = x, so du = dx, and dv 
= X'^dx so v = JAe'^dx = [— e~ gives 



E(x) = judv = [uv]—jvdu 

= [^-xe- " -j^i-e-^dx 



-Frl 



" = \ = X~K (Al.1.11) 

0 * 

To evaluate this notice that xe" u 0 as x -»• oo; see, for example, Massey 
and Kestelman (1964, p. 122). The area under the distribution curve up to 
any value x, i.e. the probability that an interval is equal to or less than x 
is, from (5.1.4), 

F(x) = 1-e-* (Al.1.12) 

so the proportion of all intervals that are shorter than the mean interval, 
putting x = A" 1 in (Al.1.12), is 

F{X-*) = 1-e" 1 = 0-6321, (Al.l. 13) 



i.e. 63-21 per cent of the area under the distribution in Fig. 6.1.2 lies below 
the mean (1.0 in Fig. 5.1.2). 



Copyrighted material 



368 Appendix 1 



§A1.1 



The median (see § 2.5, p. 26) length of the intervals between random 
event* is the length such that 60 per cent of intervals are longer, and 60 per 
cent shorter than it, i.e. it is the value of x bisecting the area under the 
distribution curve. If the population median value of x is denoted x m then, 
from (A 1.1. 12), 

i.e. x m » A" 1 log 2 = 0-69315A" 1 . (Al.1.14) 

This is shown on Fig. 5.1.2. As expected for a positively skewed distribu- 
tion (see § 4.5), the population median is less than (in fact 69-315 per cent of) 
the population mean, A -1 . The mode of the distribution is even lower at 
x = 0, aa seen from Fig. 5.1.2. 

The variance of an exponentially distributed variable, from (A 1.2.2), is 

va*(x) = X~*. (Al.1.15) 
For details see, e.g. Brownlee (1966, p. 59). 

The expectation of a function of x 

The expectation, or long run mean, of the value of any function of x, say 
g(x), can be found without first finding the probability density of g(x). 
The derivation is given, for example, by Brownlee (1966, p. 55). The results, 
analagous to (Al.1.1) and (Al.1.2), are 

E^(ar)] - Ztf*) P(x) for discontinuous distributions, (Al.1.16) 

/" + CO 

Et7(*)] = J g(x)f{x) dx for continuous distributions. (Al.1.17) 

The expectation of a function of two random variables is discussed in 
§A1.4. 

A1.2. Variance 

For any variable x, with expectation p, the population variance of x is 
defined as the expected value (long run mean) of the square of the deviation 
of x from fi = E<«), i.e. 

voa(x) = E[(x-fiF] (Al.2.1) 

= Et* 3 ]-^*]) 3 . (Al.2.2) 

The second form of the definition follows from the first by expanding the 
square. It shows that E[x?] is not the same, in general, as (E[a:]) 2 (this is an 
example of the relation (Al.1.6)). 

Use of this definition in conjunction with (Al.1.1) or (Al.1.2) gives, for 
example, the variance of the binomial distribution as n&(l —&), of the 
Poisson as m, of the exponential aa lr\ and of the normal as a 9 , as asserted 
in (3.4.4), (3.5.3), (Al.1.15), and (4.2.1). For details of the derivations see 
the references at the beginning of this section. The mean and variance of a 
function of two random variables is discussed in § A 1.4. 



Copyrighted material 



§A1.2 Appendix 1 369 

The standardized form of any random variable, x, can be defined as X say, 
where 

X = . L , J . (Al.2.3) 

(see for example, the standard normal distribution, § 4.3). A' must always 
have a population mean of zero, and population variance of one because 

nryi p f *-E[*n E[*]-E[X] 

E[ * ] - E Lv-^)J = 0 (A1 - 2 - 4) 

and, from (Al.2.2), (Al.2.4), and (Al.2.1), 

v*4(X) - E[X*)-(E[X]f = EOT 

L va*ix) J t-<2*»(x) 

A1 .3. Non-experimental bias 

It has been mentioned in § 2.6 in connection with the standard deviation, 
and in § 12.8, that estimates of quantities calculated from observations may 
be biased even when the observations themselves have no bias at all. In this 
case the estimation method (i.e. the formula used to calculate the sample 
estimate, say 6, of a parameter 0 from the observations) is said to be biased. 
An estimation method is said to have a bias = E[0]— 8, and it is said to be 
unbiased if 

E[0] = 0. (Al.3.1) 

For example, the sample arithmetic mean is an unbiased estimate of the 
parameter E[x] (whatever the distribution of x) because E[f] = E[x]. 
Using (Al. 1.4), 

E[x] = E^J x/tf] = E[Lx]/tf = 2(E[x])/tf = NE[x]JN = E[x] = ft. 

(Al.3.2) 

Furthermore, (2.6.3) gives an unbiased estimate of the population variance, 
var(x) or a*(x) t for any distribution because, from (Al.1.3), (Al.1.4), and 
(Al.2.1), 

rax-^r] nnx-fi) 2 ) £E[(x-/i) 2 ] Nv«4(x) 

(Al.3.3) 

However, if p is replaced by its (unbiased) sample estimate, £ , an unbiased 
estimate of va*[x) is no longer obtained (as discussed in § 2.6). If a 3 = 
Z{x-x)*fN then 



Copyrighted material 



370 Appendix I §A1.3 

because 2(x- / u). Y,{z-fi) = 2(x-/i) (Lx-Nfi) = 2{z-fi) {N£-Np) = 
2N(z-fi) 2 . Thus, using (Al.1.3), (Al.1.4), (A1.2.1),and (2.7.8), 

NEid*) = E[E(x-^) a -tf(f-/«) a ] 

= ZE[(z-fi)*)-NV[(z-ti)>) 
= NE[(z-p)*)-NE[{£-tf] 
= N va*(z)-N v*4(z) = N vo4(z)~N voa(x)/N 

and bo E[6*] = t*u(x)^=-^or (Al.3.4) 

Because a* is a biased estimate of va i{x), its expectation being less than 
ua4(z), it is not used. Instead it is multiplied by NftN—l) to correct the 
bias, giving the usual estimate, (2.6.2), with N-l rather than N in the 
denominator. 

A1 .4. Expectation and variance with two random variables. The 
sum of a variable number of random variables! 

In dealing with a function of two random variables, a proper procedure is to 
average over one of them holding the other fixed, and then to average over the 
other. This is rather like averaging the rows in a square table, and then averaging 
the row averages to find the grand average. The proof will be outlined for the 
case of the sum of a randomly variable number of random variables, but the result 
(§§ Al.4.11 and Al.4.12) is general. The result is used and illustrated in 5 3.6. 
Relevant information will be found, for example, in Mood and Graybill (1968, 
p. 117) and Bailey (1964). 

It will be necessary, as on pp. 68 and 388, to distinguish between random 
variables denoted 2, m, etc., and particular values that these variables may 
take, denoted r, m, etc. 

Suppose we are interested, as in § 3.6, in the sum, itself a random variable 
denoted of m values of z, where m and I are random variables, i.e. 

= «i+*s + •..+««. (Al.4.1) 
The population means and variances of the variables will be denoted, for brevity 

E(m) b fi m v«i(m) = c& (A 1.4. 2) 

E(2) = p, v**(z) = of (Al.4.3) 

We shall deal only with the case where the z values are independent, and each 
8 value is made up of a random sample (of variable size) from the population of 
z values. It is assumed in (Al.4.3) that the z values all have the same mean and 
variance, for example, that they are from a single population. 

The probability that is equal or less than a specified value S, i.e. the dis- 
tribution function of the sum (see §4.1), can be written, looking at each possible 
value of m separately, as 

P[S* < S] = P[(rh = 1 and S x < 8) or (m = 2 and S a < S) or...]. 

(Al.4.4) 

t I am very grateful to R. Oalbraith, Department of Statistics University College, 
London for showing me how to obtain the results in this section. 



Copyrighted material 



§A1.4 Appendix I 371 

The events in parentheses are mutually exclusive so, using the addition rule 
(2.4.2), this becomes a sum (over all possible m values), via. 

JP[m = m and S m < S]. (Al.4.5) 
m 

Now, using the multiplication rule in its general form (2.4.4) shows that 
P[m = m and S m ^ S] can be written in terms of the conditional probabilities 
as P[S m <; iS|m = m].P[m — m] and so (Al.4.5) becomes 

P[S* <8] = JPtf* < S\m - m].F[m = ml (Al.4.6) 
m 

This can be written in terms of distribution functions (see § 4.1) M 

F(S) = ^F(8\m).P[m = m] (Al.4.7) 

m 

and differentiating this with respect to 8 gives, as in (4.1.5), the probability 
density function as 

f(S) - ^/(SM./lrn - m]. (Al.4.8) 
m 

To And the expectation of any function of the sum, g(£n), we now simply use 
this result in the definition of expectation, (A 1.1. 17), giving 

EfoOS*)] = jg(S) f(S)dS 

- jg(S).2f(S[m).P[m - m).dS 

g(8) f[8\m) dsJ.Pfm - m] (Al.4.9) 



= = m]).P[m = m]. (Al.4.10) 

m 

The last step follows because the term in curly brackets in (A 1 .4.9) is simply the 
expectation of the function g, when m has a fixed value, tn. The value of this 
will of course depend on (be a function of) the value, m, chosen, so (Al.4.10) has 
the form J (function of m. P£»ft = m)], just like (Al.1.18), so it means that the 
m 

term in curly brackets is being averaged over all m values and (Al.4.10) can 
therefore be written 

Efotf*)] = * m (KJg{$*)\* = m]}, (Al.4.11) 

which describes in symbols the two-stage averaging process mentioned at the 
beginning of the section. The result is much more general than it appears from 
this derivation. If £ and y are any two random variables, continuous or dis- 
continuous, then, analagously to (Al.4.11) we have 

E[y(5,y)] = E^E^i^y = y]}. (Al.4.12) 

The mean value of the sum follows directly from (Al.4.11) if the function 
giSfr) is simply identified with S^. Averaging the sum for a fixed value of m, 
using the definitions in (Al.4.1) (Al.4.3), gives the term in curly brackets in 
(Al.4.11) as 

E[S*jm = mj = n%n„ (Al.4.13) 



Copyrighted material 



372 Appendix 1 



§A1.4 



i.e. the average value of the total of a fixed number, m, of values of z Sam times 
the average value of z, fairly obviously. Actually, this step is not quite as obvious 
as it looks. Written out in full we have 

E[S A \m = m] = E[(z 1 +x 7 +...+z m )\m = m] 

- Et^ltn = m] + E[za|m = m]+ ... +E[z m |m = m] 

(Al.4.14) 

and only if m is independent of the z values, i.e. if the size of z values does not 
depend on whether in is large or small, can this be written 

= E[* 1 ] + E[* a ] + ... + E(*J (Al.4.15) 

and if all the z values have the same mean, p„ as assumed, this is simply nyi, 
as stated in (Al.4.13). 

Having found this for a fixed value of m, we now do the second stage of 
averaging, over m values, treating m as a random variable though fj M is, of course, 
a constant. This gives, using (Al.4.11) with (Al.4.13) and (Al.4.2), 

E[S*] = EJ^,] 

- /*.E JA] » W m (Al.4.16) 

which is just what would be expected for the average value of the sum of m 
values of t. 

To find the variance of S we use the definition (Al.2.2) which is 

v <mS*) = E[V]-(E(S*])». (Al.4.17) 

The only thing needed now is to find the expectation of Sjf. To do this we 
use (Al.4.11) again, but this time g(S*) is identified with S**. So first we want to 
find the term in curly brackets in (Al.4.11), the expectation ofSd* when m has a 
fixed value, m. This we find by rearranging the definition of variance (Al.2.2) 
to give the general relation 

Egfl>] = V ««(*) + (E[*])» = a§+/£ (Al.4.18) 

The term in curly brackets is therefore 

EMI* = m] = »**(S«\m = m) + (E0$*|m - m]) a 

= mo*+m*/tf, (Al.4.19) 

the first term being the variance of the sum of tn independent variables, from 
(2.7.4), and the second term following from (Al.4.13) above. (This step again 
assumes that the Z| are independent of m, as in (Al.4.13).) Now we average 
over m values, i.e. we now treat m as a random variable though a\ and n„ are 
constants of course. Thus (Al.4.11) gives, using (Al.4.19) and (Al.4.2), 

= ofEJml+^EJtna] 

- o> R +/#o«+ri»), (Al.4.20) 

the last line following by the use of (Al.4.18) to find E[m 2 ]. 

The variance of 8 a can now be found by substituting (Al.4.16) and (A 1.4. 20) 
into (Al.4.17) giving the required result 

= <fy m + a*jk\ (Al.4.21) 



Copyrighted material 



§A14 Appendix I 373 

Using the coefficients of variation defined in (2.6.4), i.e. 

= ajp„ V(rh) m aj^m and V(S«) a v^tf*)]/^*]. 
we get, using (Al.4.21) and (Al.4.16), 

^(S*) - — ^ + (Al.4.22) 

An illustration of the use of this result is given in | 3.6 (p. 69). 



Copyrighted material 



Appendix 2 



Stochastic (or random) processes 

Some basic results and an attempt to explain the unexpected properties of random 
processes 



The Science of the age, in short, is physical, chemical, physiological ; in all 
shapes mechanical Our favourite Mathematics, the highly prized exponent of all 
these other sciences, has also become more and more mechanical. Excellence in 
what is called its higher departments depends less on natural genius than on 
acquired expertness in wielding its machinery. Without under-valuing the 
wonderful results which a Lagrange or Laplace educes by means of it, we may 
remark, that their calculus, differential and integral, is little else than a more 
cunningly-constructed arithmetical mill ; where the factors being put in, are, as it 
were, ground into the true product, under cover, and without other effort on our 
part than steady turning of the handle. 

Thomas Carltle 1829 
(Signs of the Times, Edinburgh Review, No. 98). 



Thb following discussions require more calculus than is needed to follow 
the main body of the book so they have been confined to an appendix to 
avoid scaring the faint-hearted. However, the principles involved are the 
important thing, so do not worry if you cannot see, for example, how an 
integral is evaluated. That is merely a technical matter that can always be 
cleared up if it it becomes necessary. 

A2.1 . Scope of the stochastic approach 

In many cases the probabilistic approach is necessary for, or at least is 
eiilightening in, the description of processes that are variable by their nature 
rather than because of experimental error. This approach might, for example, 
involve consideration of (1) the probability of birth and death in the study of 
populations, (2) the probability of becoming ill in the study of epidemics, 
(3) the probability that a queue (e.g. for hospital appointments) will have a 
particular length and that the waiting time before a queue member is served 
has a particular value, (4) the random movement of a molecule undergoing 
Brownian motion in the study of diffusion processes, and (5) the probability 
that a molecule will undergo a chemical reaction within a specified period of 
time (see examples in §§ A2.3 and A2.4). 

The appendix will deal with aspects of only one particular stochastic 
process, the Poisson process which has already been discussed in Chapters 
3 and 5. It is a characteristic of this process that events occurring in non- 
overlapping intervals of time are quite independent of each other. The same 



Copyrighted material 



§A2.1 



Appendix 2 376 



idea can also be expressed by saying, at the risk of being anthropomorphic, 
that the process has 'no memory' and therefore is unaffected by what has 
happened in the past, or that the process 'does not age' (see also Cox 
(1962, pp. 3-6 and 20)). 

Examples of Poisson processes discussed in Chapters 3 and 6 were the 
disintegration of radioactive atoms at random intervals and the random 
occurrence of miniature end plate potentials (MEPP). Other examples are 
(1) the random length of time that a molecule remains adsorbed on a mem- 
brane before being desorbed (e.g. an atropine molecule on its cellular receptor 
site, see § A2.4), and (2) the random length of time that elapses before a 
drug molecule is broken down in the experiment described in § 12.6. 

The lifetime of a moleoule on its adsorption site (or of a drug molecule 
in solution, or of a radioactive atom) is a random variable with the same 
properties as the random intervals between MEPP (see § 6.1). In the case 
of the adsorbed molecule, this implies that the complex formed between 
molecule and adsorption site does not age, and the probability of the complex 
breaking up in the next 6 seconds, say, is a constant and does not depend 
on how long the molecule has already been adsorbed, just as the probability 
of a penny coming down heads was supposed to be constant at each throw, 
regardless of how many heads have already occurred, when discussing the 
binomial distribution in Chapter 3. Consequently the Poisson distribution 
can be derived from the binomial as explained in §3.6. Another derivation 
is given in § A2.2 below. 

The arrival of buses would not, in general, be a Poisson Process, although 
it often seems pretty haphazard. The waiting time problem for randomly 
arriving buses, discussed in § 6.2, is typical of the sort of result that is 
usually surprising and puzzling to people who have not got used to the 
properties of random processes. I certainly found it surprising and puzzling 
until recently, and so I hope the reader will find the results presented below 
as enlightening as I did. 

Fur further reading on the subject see, for example, Cox (1962), Feller 
(1967, 1966), Bailey (1964, 1967), Cox and Lewis (1966), and Brownlee 
(1966, p. 190). 

A2.2. A derivation of t he Poisson distribution 

As mentioned in § 3.6, the distribution follows directly from the condition 
that events in non-overlapping intervals of time or space are independent 
each, using the definition of independence discussed in § 2.4. 

The probability of one event occurring in the time interval between t 
and t } A< can be defined as A At, if At is small enough. From the discussion 
of the nature of the Poisson process in §§3.6 and A2.1, it follows that X 
must be a constant (i.e. it does not vary with time, and does not depend on 
the past or present state of the system) that characterizes the rate of occur- 
rence of events. More properly, it should be said that the probability of 
one occurrence in the infinitesimal time interval, di, between t and f+di 
is constant and can be written Adi. This definition, plus the condition of 
independence, is sufficient to define the Poisson distribution. If finite 



Copyrighted material 



376 Appendix 2 § A2.2 



time interval*, At, axe considered then the probability of one event in the 
interval between i and M Ai should be written XAt+o(At) (see (A2.2.9)). 
Furthermore, the probability of more than one event occurring in the interval 
At becomes negligible when the interval is very short, and so it is also 
written o( A/), as shown in (A2.2.11). 

The symbol o{At), which occurs often when discussing stochastio processes, 
is used to stand for any quantity that becomes negligible relative to At 
when the interval length At becomes very small (it does not always stand for 
the same quantity, and may be used twice in the same expression standing for 
a different quantity each time). More precisely, any quantity is written 
o{At) if it obeyB the definition 

so no approximation will be involved in the limit in ignoring o(At) terms. 

The probability that there will be no events between t and t + At is thus, 
from the addition rule ((2.4.2) and (2.4.3)), 1 —probability of one or more 
events = 1— XAi— o{At). 

The probability that r events occur between 0 (the time when measure- 
ment is started) and t will be symbolized P(r, I), an extension of the notation 
used in § 3.6 and Chapter 5. Using this notation, P(0, t + At) stands for the 
probability that zero events occur between 0 and t+ At. For this to happen 
there must be both 

(zero events between 0 and 0 and (zero events between t and t+At). 

The probability of the first of these contingencies is P(0, t), and the prob- 
ability of the seoond is, as above, 1 — XAt o(At). If the events in the non- 
overlapping time intervals from 0 to I from t to <+ At are independent (this is 
the crucial, and very strong, assumption), the probability that both con- 
tingencies will happen follows from the multiplication rule (2.4.6), and is 
the product of separate possibilities, i.e. 

P(0, t+At) = P(0, t). [1 -XAt-o(At)l (A2.2.2) 

Rearranging this gives 

In the limit, letting Af-tO, the left-hand Bide becomes, by definition of 
differentiation, d P(0, t)jdt (see, for example, Massey and Kestelman (1964, 
p. 59)), and the second term on the right-hand side becomes zero (from 
(A2.2.1)) so 

d P ®' ° = -XP(0, t) (A2.2.3) 

and the solution of this differential equation is 

P(0, 0 = e-<« (A2.2.4) 



Copyrighted material 



§A2.2 



Appendix 2 377 



This is found using the condition that P(0, 0) = e° = 1 (i.e. it is certain 
that zero events will occur in zero time). The solution is easily checked by 
differentiating (A2.2.4), giving (A2.2.3) back again, thus dP(0, t)/dt = 
de~ xt {dt = -AP(0, I). Equation (A2.2.4) is just the probability of zero 
events occurring in time t given by the Poisson distribution (3.5.1), if A 
is interpreted as the average number of events in unit time (see §§ 3.5, and 
5.1 and eqn (Al.1.7)), bo *tt = Xt ia the mean number of events in time t. 

To find the Poisson distribution when r > 0 notice that r events will 
occur between 0 and f+ A< if either 

[(r events occur between 0 and t) and (zero events occur between t and <+ Ai)] 
or 

[(r— 1 events occur between 0 and i) 

and (one event occurs between t and t + At)], 

The probabilities of the four events in brackets have been defined as P(r, 0, 
(1— AA/— o(AJ)), P(r— 1, 0. »n<l AA<+o(A/) respectively. Therefore, using 
the addition rule (2.4.2) and the multiplication rule for independent events 
(2.4.6), the probability of r events occurring between 0 and <-M Ai becomes 

P(r,<+Ai) = P(r,t). [l-AA<-o(A/)]+P(r-l,«).[AA<+o(Al)]. 

(A2.2.5) 

Rearranging this gives 

P(f -' + ^- fM = -AP(r,0 + ^«r-.,0 + ^ ( P«r-l,0-P«r,,]. 

(A2.2.6) 

Again letting Ai -»■ 0 gives, using (A2.2.1) as above, 

= -XP {r ,t)+XP(r-l,t). (A2J2J) 



This holds for any r greater than 0, so putting r — 1 gives an equation 
for P(l,l), the probability of r = 1 event occurring in a time interval of 
length t. Inserting the value of P(r-U) = P{0,t) = e~" from (A2.2.4) 
into (A2.2.7) results in an equation that can be solved giving P(\,t) = (Ai)e~", 
which is the Poisson probability for r = 1 defined in eqn (3.5.1) and § 5.2. 
This can be inserted into (A2.2.7) with r = 2 to find P(2,t), the next term 
of the Poisson series. Alternatively, simply notice that the probability of 
r events in a time interval of length t, the solution of (A2.2.7) for any value of 
r, (greater than 0) is 

P(r,t) = ^V", (A2.2.8) 



Copyrighted matetial 



378 Appendix 2 § A2.2 

which ia the Poisson distribution denned in (3.6.1) (aee also § 5.1), because it 
has been shown in (A2.2.4) that (A2.2.8) does actually hold for r = 0 as well. 
This solution is easily checked by differentiating (A2.2.8) giving 



dt ~dt\r\ 6 )"rr 6 + rl [ M } 

^ x (r— 1)! ° A " r! * 
= XP(r-l,t)-XP{r,t). 
Thus (A2.2.8) is a solution of (A2.2.7). 

Why the remainder terms can be neglected 
Having derived the Poiaaon distribution, the remainder terms, which 



written of AO above, can be written explicitly, so it can be Been that they do in 
feet become negligible relative to At when At— >-0, as stated in (A2.2.1). 

The probability of r = 1 event occurring in the interval At is found by putting 
r = 1 in (A2.2.8). The exponential is then expanded in series (as in (8.6.2)) giving 

= XAt+o(At) (A2.2.9) 

as stated at the beginning of this section. All the terms but the first on the 
penultimate can be written as o(AI) because they obey the definition (A2.2.1), 
thus 

= 0 (A2.2.10) 

because every term is zero when Ai becomes zero. 

The probability that more than one event (r > 1) occurs in At Is, from (A2.2.8 ), 

rl 

and for all r> 1 this can alto be written o(M). For example, for r = 2 we have, 
using the definition (A2.2.1), 



0 (A2.2.11) 
as stated at the beginning of this section. 



Copyrighted material 



§A2.3 



Appendix 2 379 



A2.3. The connection between the lifetimes of individual 

edreneline molecules end the observed breekdown rete 
end half-life of edreneline 

In the experiment analysed in § 12.6 it is found that when adrenaline 
was incubated with liver slices in vitro the concentration of adrenaline 
fell exponentially or, to be more precise, there was no evidence that the 
relationship was not exponential. The estimated rate constant was 
k = 0*07219 min -1 (from (12.6.14)), i.e. the estimated time constant was 
1/fc = 13-85 min (from (12.6.16)), and the estimated half-life, from (12.6.6), 
was 0-69316/fc = 9*602 min. The arguments in this section apply equally to 
the disintegration of radioisotopes since the number of radioactive nuclei is 
observed to fall with an exponential time course. One then considers the 
lifetimes of individual unstable nuclei. 

Focus attention on single adrenaline molecules. Suppose that they are 
perfectly stable until, at zero time, the adrenaline solution is added to the 
liver preparation that contains enzymes catalysing its cata holism. Suppose 
that after the addition of enzymes at f •= 0, there is a constant probability,! 
AA<+o(Aj) say, that any individual adrenaline molecule will be catabolized 
in any short interval of time At. As before, A is a constant (it does not 
vary with time) that characterizes the rate of catabolism. The probability 
that the molecule will not be catabolized, from (2.4.3), is therefore 
— o(At). The argument is now exactly like that in § A2.2. Denote as P(t) 
the probability that the molecule is still intact at time t. The molecule will 
still be intact at time 1+ M if 

(it is still intact at time 0 and (it is not catabolized between t and t f- At). 

If these events are independent, then the multiplication rule of probability, 
(2.4.6), implies 

P(l+Ai) = P(t).[l~XAt-o(At)]. (A2.3.1) 
This is like eqn. (A2.2.2). Rearranging gives 

M = -xm-mJg 

and, using (A2.2.1) just as in §A2.2, when Ai -* 0 this becomes dP(()/di 
= —XP{t) (see, for example, Masse v and Kestelman (1964, p. 59)). The 
solution (using the condition that P(0) = 1, i.e. it is certain that the molecule 
is still intact at zero time) is, as in § A2.2, 

P(t) = e- M . (A2.3.2) 

Now in a large population of molecules the probability that a molecule 
will be still intact at time t can be identified with the proportion of molecules 

t See § A2.2. A fuller explanation of the nature of the term o(At), which become* 
negligible for short enough time intervale, is given in | A2.fi. 



Copyrighted material 



380 Appendix 2 



§A2.3 



that are still intact at time t, i.e. y/y 0 where y is the concentration of adrena- 
line at time t, and y 0 is the initial concentration. Equation (A2.3.2) is now 
seen to be identical with the observed exponential decline of concentration 
(eqn. (12.6.4)) if the rate constant, k, is identified with X. 

Furthermore, the probability that a molecule is still intact at time t, 
given by (A2.3.2), can be identified, just as in §5.1, with the probability 
that a molecule has a lifetime greater than t (if it did not it would not still 
be intact). The probability that the lifetime is equal to or less than t is 
therefore, from the addition rule (2.4.3), and (A2.3.2), 

1 -P(t) = 1 -e" n =F(t), (A2.3.3) 

which is exactly like (6.1.4) (the distribution function, F, was denned 
in (4.1.4)). This is consistent with (see §5.1) the hypothesis that lifetimes 
of individual adrenaline molecules are random variables following the 
exponential distribution (see Fig. 5.1.1 and 5.1.2), with probability density 
(from (4.1.5) and (A2.3.3)) 

/«) = = (< > 0) (A2.3.4) 

as previously defined ((5.1.3) and (Al.1.10)). In other words, the mean 
lifetime of molecules is X~ l (as explained in § 5.1 and proved in (Al .1 .11)). 

Referring again to the example in § 12.0, it can now be seen that the time 
constant for the observed exponential fall in adrenaline concentration, 
i _1 =A _1 = 13*85 min (from (12.6.16)), can be interpreted as the mean 
value of the lifetimes of individual adrenaline molecules (measured from 
the time of addition of enzyme at t = 0, or, as shown in the following sections 
of this appendix, from any other arbitrary time). It follows from the argu- 
ments in § A2.6 that if adrenaline molecules were being synthesized in the 
system, their mean lifetime measured from the moment of synthesis to the 
moment of catabolism would also be A" 1 = 13*85 min. 

Furthermore, the half-time for the observed decay of concentration 
0*69315/* = 9*602 min (from (12.6.6) and (12.6.17)), can be interpreted as 
the median value of the lifetimes of individual adrenaline molecules, because 
it was shown in (A 1.1. 4) that the population median of the exponential 
distribution is 0-69315/A. Fifty per cent of molecules survive longer than 
9-602 min. 

A2.4. A stochastic view of the adsorption of molecules from 
solution 

Suppose that a surface (e.g. cell membrane) containing many identical 
and independent adsorption sites is immersed in a solution and is continually 
bombarded with solute molecules. Some of these will become adsorbed on to 
adsorption sites, remain on the sites for a time, and then desorb back into 
the solution. Macroscopic observations of the amount of material adsorbed 
ean be related to what happens to individual molecules using the same sort 
of approach as in §§ A2.2 and A2.3. This is, for example, the simplest model 



Copyrighted material 



§A2.4 Appendix 2 381 

for the interaction of drug molecules with cell receptor sites and, as such, 
it is discussed by Rang and Colquhoun (1973). 

Consider a single site. The probability that a site is occupied by an ad- 
sorbed molecule at time t will be denoted P^l), and the probability that the 
aite is empty at time t will be denoted P 0 {t). Thus, from (2.4.3), 

P 0 (t) = 1-iMO- (A2.4.1) 

The probability that an empty site will become occupied would be ex- 
pected to be proportional to the rate at which solute molecules are bombard- 
ing the surface, i.e. to the concentration, c say, of the solute (assumed 
constant). The probability that an empty site will become occupied during 
the short interval of time At, between t and f-f- A/, will therefore be written 
Ac At where Ac, as in §§ A2.2 and A2.3, is a constant (i.e. does not change 
with time). The probability that an occupied site becomes empty during the 
interval A* will not depend on the concentration of solute, and so will be 
written /<Af, where /i is another constant. The probability that an occupied 
site does not become empty during Ai is therefore, from (2.4.3), 1 — /<AJ.t 
Now a site will be occupied at time <+ A* if either [(site was empty at time t) 
and (site is occupied during interval between t and <+Af)] or [(site was 
occupied at time t) and (site does not become empty between t and t-f At)]. 

Now the probabilities of the four events in parentheses have been denned as 
P o (0. AcAJ, Pi{t), and (1— /iAi) respectively. So, by application of the 
addition rule (2.4.2), and the multiplication rule (2.4.6) (assuming, as in 
§§ A2.2 and A2.3, that the events happening in the non-overlapping intervals 
of time, from 0 to t and from t to t At, are independent), it follows that the 
probability that a site will be occupied at time t+Ai will be 

P l (t+M) = P 0 (0.kAl+/M0(l-/iA0-r-o(Ai), (A2.4.2) 

where o(At) is a remainder term that includes the probability of several 
transitions between occupied and empty states during At. Aa in §§ A2.2 
and A2.3, o(At) becomes negligible when Al is made very small. Rearranging 
(A2.4.2) gives 

^ = Po(t)-b-Pi(t)-f*+-£i- 

Now let At — *■ 0 . As before the left-hand side becomes, by definition of 
differentiation (e.g. Massey and Kestelman (1964, p. 59)), dP l [dt, so, using 
(A2.4.1) and (A2.2.1), 

^ = Hl-PM-pPM- (A2.4.3) 



f The probabilities should really be written AcAl-fo(A<), /tAf-f-o(Af) and 1— pAl 
-o(At). ae in f § A2.2 and A2.S, if the time interval, At, is finite. Alternatively, it oould 
be said, am in § A2.2, that the probability that an oocupied aite becomes empty during 
the infinitesimal interval between t and t+df can be written fidt, etc. A fuller discussion 
of the nature of the o(Al) terms is given in § A2.fi. All these terms have been gathered 
together and written as o( At) in ( A2.4.2), which holds for finite time intervals. 



Copyrighted material 



382 Appendix 2 § A2.4 

If Pi{t), the probability that an individual site ia occupied at time t, is 
interpreted as the proportion of a large population of sites that is oocupied 
at time t t then (A2.4.3) is exactly the same as the equation arrived at by a 
conventional deterministic approach through the law of mass action, if 
X and fj, are identified with the mass action adsorption and desorption rate 
constants. Thus Xjft is the law of mass action affinity constant for the solute 
molecule- Bite reaction. The derivations and solution of (A2.4.3), and its 
experimental verification in pharmacology is discussed by Rang and 
Colquhoun (1973). 

The length of time for which an adsorption site is occupied; its distribution and 
mean 

In order to investigate the length of time for which a molecule remains 
adsorbed consider the special case of (A2.4.3) with Xc = 0. The probability 
of an adsorbed molecule desorbing does not depend on the probability, X, 
that an empty site will be filled, or on the concentration of solute, so this 
does not spoil the generality of the argument. For example, at t = 0 the 
surface, with a certain number of adsorbed molecules, might be transferred 
to a solute-free medium (i.e. c = 0) so that adsorbed molecules are gradually 
desorbed, but no further molecules can be adsorbed, so that a site that 
becomes empty remains empty. When c = 0, (A2.4.3) becomes 

^ = -pJW). (A2.4.4) 

This equation has already been encountered in §§ A2.2 and A2.3. Integration 
gives the probability that a site will be occupied, at time t after transfer to 
solute free medium, as 

m = *i»r" (A2.45) 

where P^O) is the probability that a site will be occupied at the moment of 
transfer (t = 0). In other words, the proportion of sites occupied, and there- 
fore the amount of solute adsorbed, would be expected to fall exponentially 
with rate constant /u. Such exponential desorption has, in some cases, been 
observed experimentally. 

Now if the total number of adsorption sites is N^u then the number of 
sites occupied at time t will be N{t) = N^P^t), and the number oocupied 
at t = 0 will be N(0) *= N^P^O). The proportion of initially occupied sites, 
that are still occupied after time t will, from (A2.4.5), be 

Nit) PM) 

m = m - (A2 - 4 - 6 ' 

and this will also be the probability that an individual site, that was occupied 
at t = 0, will still be occupied after time f\. 

A site will only be occupied after time t if the length for which the molecule 
remains adsorbed (its lifetime) is greater than t, so (A2.4.6) is the probability 



t i.e. has been ooniinuoualy occupied between 0 



Copyrighted material 



§A2.4 



Appendix 2 383 



that the lifetime of an adsorbed molecule is longer than t. Analogous situations 
were met in §§ 5.1 and A2.3. The probability that the lifetime of an adsorbed 
molecule is t or less is therefore, from (2.4.3), 

P(0^iifetime^«) m F(t) = 1 -e"*'. (A2.4.7) 

This is exactly like (5.1.4) and (A2.3.3), and is consistent with (see § 5.1) the 
hypothesis implied by the physical model of identical and independent 
adsorption sites, that the lifetime of individual adsorbed molecules is an 
exponentially distributed variable, with probability density as before, from 
(4.1.5), 

fit) = ^ = V*. (A2.4.8) 

The mean lifetime of a molecule on an adsorption site is therefore fx~ x (from 
(Al. 1.11)), the observed time constant (see (12.6.4))for desorption of adsorbed 
molecules into a solute-free medium; and, just as in § A2 3, the observed 
half-time for desorption, 0-69315//* (from (12.6.6)) can be interpreted, using 
(Al . 1.14), as the median lifetime of a molecule on an adsorption site. Fifty 
per cent of molecules stick for a longer time than 0-693 1 5//i . 

What is meant by lifetime ? In the discussion above, the lifetime of an 
adsorbed molecule was measured from the arbitrary instant (t = 0) when the 
surface was transferred to solute-free medium until the instant when the 
molecule desorbed. The average length of this residual lifetime (see § A2.7) 
was It is of more fundamental interest to known the average length of 
time a molecule remains adsorbed, i.e. the lifetime measured from the instant 
of adsorption to the instant of desorption. The mean length of this lifetime 
is also fi~ l , as implied in §5.1. It might be expected that, because the 
adsorbed molecules have already been adsorbed for some time at the time 
that the surface is transferred to the solute-free medium, and the lifetime 
measured from moment of adsorption to moment of desorption would be 
longer than (see Fig. A2.7.3). This cannot be because of the 'lack of 
memory' or 'lack of ageing* of the Poisson process. It is nevertheless sur- 
prising to most people, in just the same way as the analogous bus-waiting 
time 'paradox* described in § 5.2 is, at first sight, surprising. 

If the mean interval between bus arrivals (supposed random) is 10 min 
then the waiting time from an arbitrary moment until the next bus was 
stated in § 5.2 to be 10 min also, just as the waiting time from an arbitrary 
moment until desorption (residual lifetime) is the same as the mean 
time between adsorption and desorption (lifetime). In words, the reason for 
this is that if one looks at the surface at an arbitrary moment of time f it is 
more likely that it will contain long-lived molecule-adsorption site complexes 
than short-lived ones which, because they exist for only a short time do not 
stand such a good chance of being in existence at any specified arbitrary 
moment. Similarly in § 5.2, it is more probable that a person will arrive at 

t An arbitrary moment of time means a time ohoeen by any method at all a* long 
aa it is independent of the ooouirenoe of events, i.e. independent of the times when 
molecule* move on and off adsorption sites in this case. 



Copyrighted material 



384 Appendix 2 



§A2.4 



the bus stop during a long interval than a short one. In fact, the mean life- 
time (from the moment of adsorption to the moment of desorption) of 
molecules present at an arbitrary moment (such as the moment when the 
surface with its adsorbed molecules is transferred to solute-free medium) is 
exactly twice the mean lifetime of all molecules, i.e. it is 2fi~ 1 , so the average 
residual waiting time until desorption is as stated, and the mean length 
of time that a molecule has already been adsorbed at the arbitrary moment 
is also cf. § 5.2). These statements are further discussed and proved in 
§§A2.6andA2.7. 



The length of time for which an adsorption site is empty 

The argument follows exaotly the same lines as that presented above 
for the average length of time for which a site is occupied. As above it is 
convenient to consider the special case when the combination of molecule 
with adsorption site is irreversible so that once occupied a site remains 
ocoupied, i.e. u =« 0 (the probability of an empty site becoming occupied 
does not depend on fi so this does not spoil the generality of the arguments). 
In this case, because it follows from (A2.4.1) that dP 0 /dl = ~dPJ6t, 
equation"(A2.4.3) becomes 

^ = -fcP o (0, (A2.4.8) 

which has exaotly the same form as (A2.4.4). Using the same arguments as 
above, it follows that the length of time for which a site remains empty is 
an exponentially distributed random variable with a mean length of 
(Ac) -1 . The mean length is inversely proportional to the concentration of 
solute (c). As above, this is the lifetime measured either from an arbitrary 
moment, or from the time when the site was last vacated by a desorbing 
molecule. 

Adsorption at equilibrium 

After a long time {t -*■ oo) equilibrium will be reached, i.e. the rate at 
which molecules are desorbed will be the same as the rate at which they are 
adsorbed. Therefore, the proportion of sites occupied, P lt will be constant, 
i.e. dPJdt = 0. Equation (A2.4.3) gives 

Acd-P^P^O 

from which it follows that, at equilibrium, 

Xc Kc 

if K = Xjfi, the law of mass action affinity constant. This equation is the 
hyperbola in §§ 12.8 and in 14.6. Now it has been shown that the mean 
length of time for which an individual site is occupied is fT l , and the mean 
length of time for which it is empty is (Ac)' 1 . These values hold whether or 



Copyrighted material 



§A2.4 



Appendix 2 



386 



not equilibrium has been reached. f After transferring a membrane with 
empty sites, to a solution containing a constant concentration, c, of solute, 
the empty sites will have to wait, on average, (Ac)' 1 seconds before they 
become occupied so equilibration will take time; see Rang and Colquhoun 
(1973). Using these values, it follows that 



For example, if the probability that a site is occupied is Pj — 0*5 (this will 
be independent of time at equilibrium), i.e. 50 per cent of sites, on the average, 
are occupied at any moment of time, it follows from (A2.4.12) that empty 
time = occupied time, i.e any given site is occupied for 50 per cent of the 
time. This state is attained, at equilibrium, when (Ac)" 1 =/*~ 1 , i.e. when 
the concentration of solute is c = fifX — \jK (as inferred directly from 
(A2.4.10)). 

A2.5. The relation between the lifetime of individual radioisotope 
molecules and the interval between disintegrations 

The examples of random intervals between miniature and plate potentials 
(MEPP) discussed in § 5.1) and between bus arrivals (in § 5.2) were straight- 
forward in that there was in each case a single continuous stream of events. 
In the case of radioisotope disintegration (§§ 3.5-3.7), catabolism of adrena- 
line (§A2.3), or adsorption of solute molecules (§A2.4) the situation is 
not quite the same. For each isotopic atom there is only one event, disintegra- 
tion. Nevertheless the random intervals between MEPP or buses have the 
same properties as the random intervals denned as the lifetimes of isotope 
atoms (or adrenaline molecules, or solute molecule-adsorption site complexes). 

The mean lifetime of isotope molecules, measured from any arbitrary 
time (see §§ A2.3, A2.4, A2.6, and A2.7) may be thousands of years. For 
example the half- life (i.e. median lifetime, see § A2.3) of carbon- 14 m olecules 
is 5760 years, so the mean lifetime of a molecule is A 1 = 5760/0-69135 0.6931 5 
= 8310 years (from (Al.1.11) and (Al.1.14)), i.e. multiplying by the number 
of seconds in a Gregorian year, 8310 x 3 155695 x 10 7 = 2-6224 xlO n s. 
This is obviously independent of the amount of 1 4 C present. 

However, in § 3.7 the Poisson distribution considered was that of the 
number of disintegrations per second (it will be assumed, for the sake of 
example, that the isotope involved was 14 C). Because this variable is Poisson- 
distributed with mean, for the example in § 3.7, of ?.'t =X' = 2089*5 disintegra- 

; If the movement* of molecules could be observed, the mean length of time for 
which a site was occupied could be measured, but the average would obviously have 
to be taken over a long period of time, relative to and (Ac)' 1 , even if many sites, 
rather than just one, were observed. It can be shown that the time constant for equilibra- 
tion of the sites is (Ac+/i) -1 (see, for example, Rang and Colquhoun (1973)), so in fact 
the average can only be given a frequency interpretation over a period that is long 
relative to the time taken to reach equilibrium. 



Kc = (occupied time/empty time) 
and therefore (A2.4.10) can be written 



(A2.4.11) 



1 -f (empty time/occupied time) 



(A2.4.12) 



386 Appendix 2 



§A2.5 



tions per second, the mean number of events in t = 1 second (assuming that 
the counter detects all disintegrations), it follows from the arguments in 
§ 5.1 that the intervals between disintegrations are exponentially distributed 
with mean interval (A')' 1 = 1/2089-5 = 0-000478583 second (this obviously 
depends on the amount of 14 C present). Compare this with the lifetimes 
of individual molecules that are also exponentially distributed with mean 
lifetime X' 1 = 8310 years. These two exponential distributions are, as 
expected, closely related. This will now be shown. 

The probability that any individual 14 C atom disintegrates in an interval 
of time of length Af, from the arguments in §§5.1 and A2.3, must be AA/.| 
Suppose that at time t a sample of 14 C contains N{t) undie integrated 14 C 
atoms. Define as an 'event' the disintegration of any of these atoms, i.e. 
if the atoms could be numbered, the disintegration of either atom number 
1 or atom number 2 or ... or atom number N(t). The probability of this 
event occurring in an interval of time of length A/, is, from the addition rule 
(2.4.1), 

AAf+AA*+...+AAt-o(A0 = N(t)X&-o(to), (A2.6.1) 

where o(Af) is a remainder term (see (A2.2.1)) that includes all the prob- 
abilities of more than one disintegration occurring during A*, which will be 
negligible when At is made very small. The argument now follows exactly 
the same lines as in § A2.3. Define the probability that no event occurs up 
to time t as P[t). No event will occur up to time t -f- A/ if (no event occurs up 
to t) and (no event occurs between t and t-\-Ai), and the probability of this is, 
from the multiplication rule (2.4.8), 

P(*+A*) = P(t)[l —N(t)Xto+o{to)]. (A2.5.2) 

Rearranging this and allowing Af 0 gives, as in (A2.2.2) and (A2.3.1), 
&P(t)(dt = -N{t)XP{t). Now if the length of time considered is short enough 
for the decay of the radioisotope to be negligible (as assumed in § 3.7) then 
N(t) can be treated as a constant. It follows that the solution for P(t), 
using the condition that P{0) = 1 (i.e. it is certain that no events will 
occur in zero time), will be, as before, 

P(0 = e-"<»", (A2.5.3) 

just as (A2.3.2). This probability, that no disintegration will occur up to 
time t, can be identified with the probability that the interval between 
disintegrations is longer than t. Using the same arguments as in § A2.3 it 
follows that the interval between disintegrations is an exponentially dis- 
tributed variable with a mean length, defined above as {X')' 1 , of {N(t)X)~ l , 
and the mean number of disintegrations per second is therefore 

X' - N(t)X (A2.5.4) 

t This probability should really be written AA*+o{Al), if At u finite, M in || A2.3 
and A 2.4, The nature of the o(Ai) terms, and » more rigorous derivation of (A2.6.1). 
•re discussed at the end of this section. 



Copyrighted material 



§A2.5 



Appendix 2 387 



which decreases, as expected, as the total number of isotope molecules, 
N{t), decreases. The intervals, will of course, only be exponentially distributed, 
and the disintegration rate will only be Poisson distributed, over time 
intervals short enough for N[t) to be substantially constant. Using (A2.5.4) 
and the figures given above for the example in § 3 .7 shows that the number 
of l *C atoms present at the time the sample was counted must have been 

N{t) - X'jX « 2089-5 (atoms a" 1 ) x 2-6224 x 10" (s). 
= 6-4795 Xl0 u atoms. 
Therefore the weight of U C was 

5-4796 x 10 14 /6-023x lO 38 = 9-098 X 10" 10 gramme molecules, or 
9 098xl0 10 xl4 = 12-74xl0-»g. 

A more careful look at the nature of the o(Aj) terms in processes like the 
catabolism of adrenaline, the decay of radio-isotopes and the adsorption of 
molecules 

The basic Poiason process consists of a continuous stream of events, such as the 
occurrence of miniature end plate potentials (see J 6.1) or the random arrivals of 
buses at a bus stop. It was shown in | A2.2 that in this sort of process the prob- 
ability of one event occurring in a finite time interval A* can be written as X&t 
+ o{At). Obviously this probability cannot be written simply as /Ai because this 
would become indefinitely large, if long enough time intervals were considered, 
whereas all probabilities must be leas than 1. 

In processes like the catabolism of adrenaline, the decay of radioisotopes, or 
the adsorption of molecules, the situation is not quite the same. Each adrenaline 
molecule can only be destroyed once, so one cannot consider the probability of it 
being "destroyed r tunes during M" as in J A2.2. Nevertheless it clearly will not 
do to say that the probability of catabolism (decay, adsorption, etc.) during 
At is IM, because, as above, this can be greater than 1. Suppose that this prob- 
ability can be written /.At + o( M). The argument In the first part of this section 
can now be made more rigorous. 

The catabolism (decay, etc. ) of different atoms during a finite time Ai are not 
mutually exclusive events, so the simple addition rule cannot be used. Instead, 
the binomial theorem should be used. In the language of || 3.2-3.4, let a 'trial' 
be the observing of a molecule during the time At, and let a 'success' be the 
occurrence of catabolism (decay, etc.) during this period. If, as above, there are N 
molecules present altogether, then the probability that one of them will be 
catabolized (decay, etc.) during A* can be identified with the probability of 
r = 1 success occurring in N trials, and this is given by the binomial distribution, 
(3.4.3), as N&{\ — 9) M ~ X where it has been supposed that the probability of 
success at each trial can be written 9 — ).M \ o(M). This probability is the same 
at every trial as discussed in 9 3.5. Substituting it for 9 in JV3»(1 -^) w " 1 , and 
expanding the resulting expression (use the binomial expansion on (l—£P) N ~ l ), 
it is found that the required probability of one of the A molecules being cata- 
bolized during Al can Indeed be written 

N0(l-0)»- 1 = NXM+o(M) (A2.6.5) 

as asserted, on the basis of a simplified argument. In (A2.5.1). 

This argument can now be turned upside down starting from the experi- 
mental observations and working backwards. The decay of radioisotopes, and, 



Copyrighted material 



388 Appendix 2 



§ A2.6 



in some circumstances at least, the catabolism of molecules, and the desorbtion 
of adsorbed molecules, are observed to follow an exponential time course. In each 
case the implication is that the probability that a molecule is still intact at time 
t, 1—F(t), is e~ il . This is consistent, as described in earlier sections, with the 
physical model that specifies that the lifetime of individual molecules is an 
exponentially distributed variable with mean X'K In the case of radioisotope 
decay, this can be confirmed experimentally by the observation that the number 
of disintegrations in unit time is Poisson distributed (over times during which N 
is substantially constant). Now, if the number of molecules catabolized, etc. 
during Ai is Poisson distributed, and the mean number of events during AJ is 
X'&t as above, then the probability that one molecule of the N present will be 
catabolized, etc. during Ai is given by the Poisson distribution, (3.6.1), with 
r - 1 and m = A'Ai, i.e. it is VU*-*'*'. Substituting X' - NX from (A2.5.4), and 
expanding the exponential term exactly as in (A2.2.9), gives the probability of 
one of the N molecules being catabolized, etc. during Ai as 

NXM.e-»*** - iWA*+o(Al). (A2.5.6) 



just as in (A2.6.6) and (A2.5.1). Now, according to the argument above, this can 
be equated with A";** 1 ,*i s : where 9 is the probability of any individual 
molecule being catabolized during At. The only two solutions of this equation for 
9 are 9 = Xht or & = \jAi + o{At). The former will not do, as explained above, 
so the probability must be written X M + o( Ai) as asserted in §§ A2.3-A2.5. 



A2.6. Why the waiting time until the next event does not 
depend on when the timing is started, for a Poisson 
process 

The assertion that waiting time does not depend on when timing was 
started has been made repeatedly in Chapter 5 and this appendix. For 
example, the mean waiting time until a molecule is desorbed does not depend 
on the arbitrary time when the timing is started, and will be the same, 
A~\ aa if the timing were started from the moment the molecule was adsorbed. 

Suppose that the interval from one event to the next is exponentially 
distributed with mean X~ l . It will be convenient, aa at the end of §4.1, 
to use I to stand for time measured from the last event considered as a 
random variable, and t. t 0 , etc., to stand for particular values of I. Suppose 
that a time t 0 is known to have elapsed from the last event. Given this fact , 
what is the probability that the time from t a until the next event (the 
residual lifetime) is less than any specified time (, i.e. what is the probability 
that f« 0 +< (event E x say) given that l>t 0 (event E a say)? In symbols 
this is P(Ej|E a ), i.e. from the definition of conditional probability (2.4.4), 

Now the event that (f« 0 +< »nd l>t 0 ) is the same as the event 
t 0 <l<t 0 +t and, because the intervals between events are being supposed 



Copyrighted material 



§ A2.6 Appendix 2 

to follow the exponential distribution (5. 1 .3) , / (f) = Ae _ At with mean interval 
between event* = A" 1 , the probability of this is, as in (4.1.2), 

P(t 0 <l<t 0 +t) = j^'kr^dt 

= [— e""]' 0 *' = e-"o-e- A(, * +,) 

= e- A1 — e-"*-". (A2.6.2) 
The denominator of (A2.6.1) is (cf. (5.1.4)), 

■ e" A, «. (A2.6.3) 



Substituting (A2.6.2) and (A2.6.3) into (A2.6.1) gives the required oonditional 
distribution function (cf. (4.1.4)) for the residual life-time, t, (measured from 
t 0 to the next event) as 

P(i < * 0 +f|* > g = ^ 

= l-e-*', (A2.6.3) 

which is identical with the distribution function ((5.1.4) or (A2.3.3)) for 
the intervals between events {measured from the last event to the next event). 
Differentiating, as in (A2.3.4), gives the probability density for the residual 
lifetime, f, as /(f) = Ae"*', the exponential distribution with mean A -1 
(from (Al.1.11)), exactly the same as the distribution of intervals between 
events. The common-sense reason for this curious result has been discussed 
in words in §§6.2 and A2.4, and is proved in § A2.7. 

A2.7. Length-biased sampling. Why the average length of the 
interval in which an arbitrary moment of time falls is 
twice the average length of all intervals for a Poisson 
process 

In § 5.2 it was stated that if buses arrive randomly with an average interval 
of 10 min then, if a person arrives at the bus stop at an arbitrary time, the 
mean length of the interval in which he arrives is 20 min. Similarly, in 
§ A2.4 it was asserted that the mean lifetime of adsorbed molecule- 
adsorption site complexes in existence at a specified arbitrary moment of 
time was twice the average lifetime. In each case this was explained by 
saying that a long interval has a better chanoe than a short one of including 
the arbitrary moment, i.e. the interval lengths are not randomly sampled 
by choosing one that includes an arbitrary time, just as rods of different 
length would, doubtless, not be randomly sampled by picking a rod out of 
a bag containing well mixed rods. The long rods would stand a better chanoe 



Copyrighted material 



390 Appendix 2 § A2.7 



of being picked. Sampling of this sort is described as length-biased (see, 
for example, Cox 1962, p. 66). 

The specifying of the arbitrary moment of time constitutes the choice of 
an interval (the interval in which the time falls) from the population of 
intervals between events. Imagine that intervals are repeatedly chosen in 
this way. What will their average length be ? First, the distribution of then- 
length must be found. 

The distribution of intervale chosen by length-biased sampling 

One difficulty in deriving the required result arises because it is necessary 
to consider an infinite population of intervals. It will be much easier to 
start off with a finite population. Imagine a finite set of N intervals, and call 
the length of an interval (the tth interval) The total length of time occupied 
by the intervals is thus The fraction of this total time occupied by the 

all I, 

»th interval will be 

j; (A2.7.1) 



mil t, 



If these fractions are added up for all intervals that are longer than some 
specified length t, the result is the proportion of time occupied by intervals 
longer than t: 



time occupied by intervals longer than t 



t, total time 



(A2.7.2) 



= probability that a point chosen at random | 

falls in an interval longer than t (A2.7.3) 

5 1-^(1) (A2.7.4) 

if F x {t) stands for the distribution function of intervals chosen by length- 
biased sampling (defined as the proportion of intervals thus chosen with 
length less than the specified value, t, so 1— F x (<) is the proportion with 
length greater than t ; see (4.1.4) and (6.1.4)). 

The crucial step, the equating of (A2.7.2) and (A2.7.3), certainly looks 
reasonable. Another way of looking at it is to suppose that the probability 
of choosing any particular interval is directly proportional to its length, 
t ti so longer intervals are more likely to be chosen. The proportionality 
constant must be chosen so that all the probabilities add up to 1, because it is 
certain that one interval or another will be chosen. The proportionality 

constant is therefore 1/ J t, giving 

»ii ii 

t i.e. a point chosen at random with the uniform (or rectangular) distribution over 

the interval 0, 2 I,. 

•u t, 



Copyrighted material 



§ A2.7 Appendix 2 391 

probability of choosing an interval of length t t is 

constantx i, - -yj- (A2.7.5) 



It follows, using the addition rule, (2.4.2), that the probability of choosing 
an interval longer than t is found by adding these probabilities for all inter- 
vals longer than t giving 

I'. 

probability of choosing an interval longer than t = » (A2.7.8) 

all I 

which is exactly the same as found above, eqn. (A2.7.3). 

Now suppose that in the finite population of A 7 intervals, some of the 
intervals are of identical lengths. There are /, intervals of length t t , say 
(so 2/, = N). The time occupied by the/, intervals of length f, must be/,/,, 
and the total time occupied by all N intervals must be V /,*,. The proportion 

of the total time occupied by intervals longer than a specified value, <, 
by modificating of (A2.7.2) (or A2.7.6)) must now be written 



time occupied by intervals longer than t 
total time 

= probability that an interval chosen by 

length-biased sampling is longer than * (A2.7.7) 

I Ut I *Vi 

_ l t >t t i > t 



• 11 l< All t, 



(A2.7.8) 



if P, is defined as fJN, the proportion of intervals of length t t in the popula- 
tion. The values of P t define the (discontinuous) distribution of interval 
lengths *„ in the finite population under consideration. 

It is now possible, at last, to revert to the real problem, in which there is 
an infinite population of intervals and the intervals can potentially be of 
any length, i.e. they have a continuous distribution (see Chapters 4 and 5). 
All that is necessary is to replace P, bydP = f(t)dt (from (4.1.1)). As described 
in Chapter 4, dP is the probability that the length of an interval will lie 
within the very narrow range between t and t+dt. When this is substituted 
in (A2.7.8) the summations must, of course, be replaced by integrations. 



Copyrighted material 



m Appendix 1 § A2.7 

The remit m 

proportion, of tame occupied by mtervaJs longer than I 

= probability that an interval chosen by length biased 
sampling it longer than t 




A /(duration of interval as a multiple of the mean of all interval*) 



Fio. A 2. 7.1. Distributions of the length of random intervals. The abscissa 
is plotted as in Figs. 6.1.1 and A2.7.2. The distribution of durations in the 
population, f(t), is the exponential distribution, exactly as in Fig. 5,1.2. The 
distribution of the lengths of intervals chosen by length-biased sampling, A(*), 
■h owe that relatively few short intervals will be chosen, and the mean interval is 
twic e as long as the mean of the whole population. If the abscissa is multiplied by 
A -1 to convert it into time units, the probability density would be divided by 
A -1 , so the area under the curves remained 10. 



Copyrighted material 



§A2.7 



Appendix 2 303 



For the exponential distribution of intervals in the population, whioh is 
what we are interested in, substitute the definition of this distribution, 
/(<) = Ae~ in (A2.7.9). The integral in the denominator of (A2.7.9) has 
already been shown in (A 1.1.11) to be A -1 . The numerator of (A2.7.9), 
integrating by parte exactly as in (Al.1.1 1), is 

= (A-HOe"". (A2.7.10) 
Substituting these results in (A2.7.9) gives 

l-J\« = K P = (l+*)e-« (A2.7.11) 

as the proportion of intervals longer than t , when the intervals are chosen 
by length-biased sampling. Compare this with the proportion of intervals 
longer than t in the whole population whioh, from (6.1.4) or (Al.1.12), is 
1— F(t) = e~ At . The oumulative distributions are plotted in Fig. A2.7.2. 

The proportion of intervals longer than the mean interval 

The mean length of all intervals in an exponentially distributed population 
is A -1 , as proved in (Al.1.11). It was shown in (Al.1.13) that 63-21 per cent 
of all intervals are shorter than the mean, A" 1 . Therefore 100— 63-21 a 36*79 
per cent of all intervals are longer than the mean length. The proportion of 
time occupied by intervals that are longer than the mean follows directly 
from (A2.7.11) and (A2.7.9.), putting t = A -1 , and is thus 

(l+AA-^-*"" 1 = = 0-7358, i.e. 73-68 per cent. 

(A2.7.12) 

Thus, although only 36*79 per cent of intervals in the population are 
longer than the mean length, this 36-79 per cent ocoupy 73-68 per cent of 
the time, and this is one way of looking at the reason for there being a greater 
chance of an arbitrary time falling in a long interval than a short interval. 

The mean length of an interval chosen by length-biased sampling 

The question posed at the beginning of this section can now be answered. 
The probability density function (see §4.1) denning the distribution of 
lengths of intervals chosen by length-biased sampling follows from (A2.7.11), 
using (4.1.5), and is 

/ 1 (0=-^i(0 = ^[l-(l+^)e- 1< ] 

= XHe- lt (for t > 0). (A2.7.13) 

This distribution curve is drawn in Fig. A2.7.1, and compared with the 
distribution ourve,/(0 = Ae - *', for all intervals in the population. 



Copyrighted material 



394 Appendix 2 § A2.7 




1234: 

A/ (duration of interval as a multiple of the mean of all intervals) 



Fig. A 2. 7. 2. Cumulative distributions of the lengths of random intervals. 
The distribution function, F{t), for the lengths of all intervals is exactly as in 
Fig. 5.1.1. The abscissa is the interval length as a multiple of the mean length of 
all intervals, i.e. it is U as in Fig. 5.1.1. If the mean length of all intervals were 
A -1 = 10 s, the figures on the abscissa would be multiplied by 10 to convert them 
to seconds. The cumulative distribution, F^t), for intervals chosen by length- 
biased sampling, is seen to have more long intervals than there are in the whole 
population, the mean being 2A _1 (i.e. 20 s in the example above). 



Copyrighted material 



§ A2.7 Appendix 2 396 

The mean length of an interval chosen by length-biased sampling now 
follows from (Al.1.2), and is 

E(0 = f "<A(0d/ = [ "xH*e-"dt. (A2.7.14) 
Jo Jo 

To solve this, integrate by parts (see, for example, Massey and Kestelman, 
(1964, pp. 332, 402)), as in (Al.1.11). Put u = t 9 , so du = 2t 6t, and put 
dv = X*e-"dt, so v = /^""di = [—la-**]. Thus 

E(<) = udv = [uv]-jvdu 

= ^-^Ae-^J"— J^"(— Ae" A «) (2tdt) 

= 2X'\ (A2.7.15) 

i.e twioe the mean [X' 1 ) of all intervals, as stated. In the evaluation of the 
first term on the second line of (A2.7.15) notice that t 2 e 11 0 as I -*» oo ; 
see, for example, Massey and Kestelman (1964, p. 122). The integral on the 
third line of (A2.7.15) is simply the mean of the exponential distribution, 
shown in (Al.1.11) to be X~ l . 



*7 



Copyrighted material 



Tables 



Table Al 
Nonparametric confidence limits for the median 

See ff 7.3 and 10.2. Rank the n observations and take the rth from each end aa 
limit*, t With sample* emaller than n = 6. 96 per cent limit* cannot be found, 
but the P value for the limit* formed by the largest and smallest (r = 1) observa- 
tion* are given (Xair 1940). 



Sample 


Pi 


approx. 


JF» 


approx. 


Sample 


P approx. 


P 


approx. 


six* 


95 


per cent 


99 


percent 


ske 


95 


percent 


99 


percent 


n 


r 


100 P 


r 


100 P 


a 


r 


100/* 


r 


100 P 












31 


10 


wi -UO 


8 


99-66 


2 


It 


50-0 






32 


10 


VO'OU 


9 


99-30 


3 


It 


750 






33 


1 1 


Oil KI\ 


9 


99-54 


4 


It 


87-5 






34 


11 


97 56 


10 


99-10 


5 


It 


93-75 






35 


12 


95 90 


10 


99-40 


0 


1 


96-88 






36 




97-12 


10 


99-60 


7 


1 


98-44 






37 


19 


OX \f\ 


11 


99-24 


8 


1 


99-22 


1 


99-22 


38 


13 




11 


99-50 


9 


2 


M-10 


1 


99 60 


39 


13 


97-62 


12 


99-08 


10 


2 


97-86 


1 


99-80 


40 


14 


96-16 


12 


99-36 


11 


2 


98-82 


I 


99-90 


41 


14 


97-24 


12 


99-52 


12 


3 


9614 


2 


99-38 


42 


15 


95 64 


13 


99-20 


13 


3 


97-76 


2 


99-66 


43 


15 


96-84 


13 


99-46 


14 


3 


98-70 


2 


99-82 


44 


16 


96-12 


14 


99-04 


IS 


4 


96-48 


3 


99-26 


46 


16 


96-44 


14 


99-34 


16 


4 


97-88 


3 


9958 


46 


16 


97-42 


14 


99-54 


17 


6 


9510 


3 


99-76 


47 


17 


9600 


15 


99-20 


18 


5 


96-92 


4 


99-24 


48 


17 


97 06 


15 


99-44 


19 


5 


98 08 


4 


99-56 


49 


18 


95-56 


16 


9906 


20 


6 


96-86 


4 


99-74 


50 


18 


96 72 


16 


99 34 


21 


6 


97-34 


6 


99-28 


61 


19 


9512 


16 


99 54 


22 


6 


98-30 


5 


99-56 


62 


19 


96-36 


17 


99-22 


23 


7 


96-64 


5 


99-74 


53 


19 


97-30 


17 


99-46 


24 


7 


97-74 


8 


99-34 


54 


20 


95-98 


18 


99-10 


25 


8 


96 68 


6 


99-60 


65 


20 


9700 


18 


99-36 


26 


8 


97- 10 


7 


9906 


56 


21 


96-60 


18 


99-54 


27 


8 


9808 


7 


99-40 


57 


21 


96-68 


19 


99 24 


28 


9 


96-44 


7 


99-62 


68 


22 


96-20 


19 


99-46 


29 


9 


97-68 


8 


99-18 


59 


22 


96-36 


20 


99-14 


30 


10 


96-72 


6 


99-48 


60 


22 


97-26 


20 


9938 



Copyrighted material 



Tables 397 



Sample 


P appro*. 


P approx. 


Sample 


P approx. 


P approx. 


■ i yet 


95 


par cant 


99 par cent 




95 par cant 


99 par oent 


n 


r 


100 P 


r 


100 P 


n 


r 


100 P 


r 


100 P 




61 


23 


96 04 


21 


9902 


71 


27 


96 80 


25 


99-14 


62 


23 


9700 


21 


99-28 


72 


28 


95 56 


25 


99-36 


63 


24 


95-70 


21 


99-48 


73 


28 


96-56 


26 


9904 


64 


24 


96-72 


22 


99- 18 


74 


29 


95-26 


26 


99-30 


65 


25 


95-36 


22 


9940 


75 


29 


96-30 


26 


99-48 


66 


25 


96-44 


23 


9908 












67 


26 


95-02 


23 


99-32 












68 


26 


9616 


23 


99-50 












69 


26 


9706 


24 


99-24 












70 


27 


95-86 


24 


99-44 













Copyrighted material 



Table A2 



Confidence limits for the parameter of a binomial distribution, i.e. the 

population proportion of 'successes* 

See 7.7, 7.8, 10.2 and 3.2-3.4. If r 'successes 1 are observed in a sample of 
n 'trials', the confidence limits ( L00JY and 100 & v from eqna (7.7.1) and (7.7.2)) 
for tP, the proportion of 'successes' in the population (see } 3.2) from which the 
sample was drawn, can be found from the table. Reproduced from Documenta 
Qeigy Scientific Tables, 8th edn by permission of J. R. Geigy 8. A., Basle, Switzer- 
land. The Qeigy tables give limits for all n from 2 to 1000. 



o 
l 

2 
3 
4 

6 
6 

7 







06% limit* 


00% UiniU 


r 


100f/ft 


100#L lOO'u 


lOO^u 100# v 


ft - 2 


0 


0-00 


0-00- 84 10 


0-00- 02 03 


1 


60 00 


120 08-74 


0-26- 00-76 


2 


10000 


16-81-100-00 


707-10000 


ft-8 


0 


000 


0 00- 70-7* 


0-00- 82 00 


1 


33 33 


0 84- 00 67 


0-17- 06 86 


2 


66-67 


9 43- 00 16 


4-14- 00 83 


2 


100-00 


20 -24-100 00 


17-10-100 00 


ft - 4 


0 


000 


0 00- 00-24 


0-00- 78-41 


1 


26 00 


0 63- 80 6ft 


0 18- 88 01 


2 


60 00 


6-76- 03 24 


2 04- 07 06 


3 
4 


7600 
10000 


10 41- 00 37 
30 76-100-00 


11 00- 00 87 
26 60-100 00 


« - 6 


0 


0-00 


0 00- 6218 


0 00- 66 34 


1 


20 00 


0 61- 7164 


0-10- 81 40 


2 


40 00 


6-27- 86-34 


2 20- 01-72 


S 


00 00 


14 66- 04-73 


8 28- 07-71 


4 


80 00 


28 36- 00 40 


18 61- 00 00 


6 


100O0 


4782-100 00 


34 66-100-00 


ft - 6 


0 


000 


0 00- 46 08 


0-00- 68-66 


1 


1667 


0-42- 64- 12 


0 08- 74-60 


2 


8833 


4-38- 77-72 


1 87- 86-64 


3 


60-00 


11 81- 88- 10 


6 63- 03-37 


4 


6667 


22 28- 06-67 


14 36- 08-18 


6 


S3 33 


36 88- 00 58 


26 40- 00 02 


1 


100-00 


64 07-100O0 


41-36-100 00 


ft - 7 



0-00 
14 20 
28 67 
42-66 
67 14 
71-43 
86-71 



0-00- 40-06 
0-86- 57-87 



367- 

990- 
18 41- 
20 04- 
42 IS- 00-64 
6004-100 00 



70-oa 

81 59 
00 10 
96 33 



ft-8 



0 00- 68 00 

0- 07- 68 40 

1- 58- 70 70 
6-63- 88 28 

11-77- 04 47 

20 30- 08 42 

31-61- 00 03 
46 01-100 00 



0 


0-00 


0 00- 86 04 


0-00- 48-43 


1 


12-60 


0-32- 62 66 


0-06- 68 16 


s 


26-00 


3 10- 65 00 


1-87- 74 22 


3 


37-60 


8 62- 75 61 


4-76- 83-03 


4 


6000 


16-70- 84 30 


0-00- 00-01 


6 


62 50 


24 40- 01-46 


16 07- 96-26 


6 


76-00 


34 91- 06 81 


26-78- 98-63 


7 


87-50 


47 36- 00 68 


36 85- 00-04 


8 


10000 


68 06-10000 


51 57-100 00 







06% UmlU 


00% UmlU 


f 


lOOf/ft 


100^ L 100# o 


100# L 100# o 


ft - 9 


0 


000 


0-00- 88 68 


0-00- 44-60 


1 


11-11 


0 28- 48 26 


0 06- 68-60 


2 


22-22 


2 81- 60-01 


1-21- 60-26 


3 


33 33 


7 40- 7007 


4-16- 78-09 


4 


44 44 


13 70- 78-80 


8 68- 86 80 


5 


65 66 


21 20- 86 80 


14*1- 0132 


6 


66 67 


29 93- 02-61 


21 01- 06 84 


7 


77-78 


30 00- 07-10 


80-74- 08-70 


8 


88 86 


61-75- 00 72 


41 50- 00 94 


0 


100 00 


66 37-100 00 


65 60-10000 






n - 10 




0 


000 


0 00- 30 86 


0 00- 41-13 


1 


10 00 


0 26- 44 60 


0-06- 64 48 


2 


20-00 


2 62- 66-61 


1O0- 64 82 


8 


80 00 


6 67- 66 25 


8-70- 73 61 


4 


40 00 


12 16- 78-76 


7-68- 80 91 


6 


60-00 


18-71- 81-20 


12 88- 87-17 


6 


6000 


26-24- 87-84 


19 -Oft 02 82 


7 


7000 


34-76- 03 38 


26 49- 96-80 


8 


80 00 


44 30- 07-48 


35-18- 06-01 


0 


oooo 


55 50- 00-76 


45 57- 00 96 


10 


100 00 


69 15-100-00 


58-87-10OO0 



ft - 11 


0 


000 


0 00- 28 40 


0O0- 88 22 


1 


909 


0 28- 41-28 


0 06- 50 86 


2 


18 18 


2 28- 51-78 


0 08- 60 86 


S 


27 27 


6 02- 60 07 


8-38- 60-38 


4 


3636 


10 08- 60 21 


6-88- 76-68 


5 


45 46 


16 76- 76-62 


11-46- 8807 


6 


54 66 


23-38- 83 26 


16-08- 88-65 


7 


68-64 


30-70- 80 07 


28 82- 08 12 


8 


72-73 


39 03- 03 08 


30 67- 06-67 


0 


81-82 


48 22- 07 72 


80 16- 00 02 


10 
N 


00-01 

100 oo 


58 72- 00-77 
71-61-100O0 


40- 14- 00-95 
61-78-100O0 



0 

1 

2 
3 
4 

5 
6 
7 
8 
0 
10 
11 
12 



n - 12 



000 
8-33 
1667 
2500 
33 33 
41-67 
50 00 
68 33 
66 67 
7600 
83 33 
91-67 
100-00 



0 21- 88 48 
2 00- 48 41 
5 40- 67 18 
0O2- 66 11 

1617- 72 33 

21 00- 78 01 

27-67- 

34 80- 

42 81- 

51-50- 

61 

73 64-100 00 



84-88 
00O8 
94 61 
07 91 



0-00- 36 60 
0O4- 47 70 
0 00- 67 20 
308- 66 62 
6-24- 72-76 
10-84- 70-15 
16-22- 84-78 
20 85- 8006 
27 26- 08 76 
34 48- 06O7 
42 71- 00-10 



64 81-100.00 



Copyrighted material 



Tables 399 







96% UmlU 


9«% limit* 


r 


lOOr/n 


100^ L IOO^d 


10O# L lOO^y 


* - 18 


0 


000 


0 00- 24 71 


0 00- 38 47 


1 


?M 


0 19- 36 03 


0 04- 44 90 


2 


1638 


102- 46 46 


0-83- 64 10 


3 


23 08 


6 04- 68-81 


2 78- 62 00 


4 


30 77 


0 00- 61 48 


6-71- 69 13 


6 


38 40 


18-86- 68 42 


9 42- 76 40 


e 


4816 


19 22- 74 87 


18-88- 81 13 


7 


5386 


25 18- 80 78 


18 87- 8617 


8 


01 54 


81 58- 80 14 


24-64- 90 58 


0 


09 23 


38-67- 90 91 


80-87- 94 29 


10 


70 92 


46 19- 94 90 


87 94- 97-22 


11 


84 02 


64-65- 98 08 


46-90- 99 17 


12 


02-81 


03 97- 99-81 


55-10- 99-96 


18 


10000 


76-29-100 00 


66 68-100.00 


* - 14 


0 


0-00 


0 00- 28 16 


0-00- 81-61 


1 


714 


018- 88-87 


0-04- 42-40 


2 


14-29 


1-78- 42-81 


0-76- 61-28 


8 


21-43 


4-86- 60 80 


2-67- 68-92 


4 


28-67 


8 89- 68 10 


6-26- 66-79 


6 


36-71 


12-76- 64 80 


8-66- 72 01 


« 


42-80 


17-66- 71-14 


12 07- 77 06 


7 


6000 


23 04- 76-96 


17-24- 82 76 


8 


6714 


28 86- 82 34 


22 34- 87 33 


9 


04 29 


85-14- 87 24 


27 99- 91-84 


10 


7148 


41-90- 91-61 


84-21- 94 74 


11 


7867 


49-20- 96-84 


41 08- 97-43 


12 


85 71 


67 19- 98 22 


48-77- 99-24 


18 


92 80 


66 13- 99 82 


67-60- 99 96 


14 


100-00 


76-84-100 00 


68-49-100 00 


M - 15 


0 


0 00 


0-00- 21-80 


0-00- 29-76 


1 


007 


017- 81-95 


0-08- 40 16 


2 


1883 


188- 40 40 


0-71- 48-63 


8 


2000 


4 83- 48 09 


2-89- 60 05 


4 


20 67 


7-79- 65 10 


4-88- 62 78 


6 


33-33 


11-82- 61 02 


8-01- 68 82 


6 


40 00 


10 84- 67-71 


11 70- 74-89 


7 


48 87 


21-27- 78-41 


16-87- 79 49 


8 


63 88 


20 69- 78 78 


20-51- 84 13 


0 


00 00 


32 29- 88 00 


26 01- 88 80 


10 


08-87 


38 88- 8818 


3118- 91 99 


11 


73-38 


44 90- 92 21 


87 27- 96 12 


12 


80 00 


61-91- 96 67 


48 96- 97-61 


18 


80 07 


69 54- 98-84 


61-87- 99-29 


14 


98 38 


08 06- 99 88 


59 84- 99 97 


lb 


100-00 


78 20-100-00 


7024-100 00 


• - 10 


0 


0-00 


0 00- 20-69 


0 00- 28-19 


1 


0-26 


016- 30 28 


0-08- 88-14 


2 


12 50 


1-66- 88 86 


0-67- 40 28 


8 


18 76 


4 06- 46 06 


2-23- 63 44 


4 


25 00 


7-27- 62 38 


4 66- 69-91 


6 


31 25 


11-02- 68 68 


7-46- 66 86 


« 


3750 


16-20- 64-67 


10 88- 71-82 


7 


48-75 


19-76- 70-12 


14-71- 78-38 


8 


60 00 


24 66- 76 36 


18-97- 81 OS 


8 


6026 


29 88- 80-25 


23 02- 85 29 


10 


02 50 


36 48- 84 80 


28 68- 89 14 


11 


08 76 


41 84- 88-98 


84-16- 92 66 


12 


7600 


47-62- 92 73 


40 09- 96-46 


IS 


81-26 


54-85- 95-95 


40 66- 97-77 


14 


87 60 


61-66- 98 46 


68-72- 99 33 


16 


98-76 


89-77- 99 -84 


61-88- 99-97 


16 


100 00 


79-41-100 00 


71-81-100-00 


n - 17 


0 


000 


0-00- 19-61 


0 00- 26-78 


1 


5-88 


016- 28-69 


0-08- 86 80 


2 


11-70 


1 40- 80 44 


0-68- 44 18 


8 


17-86 


8 80- 48 48 


2 09- 61 04 


4 


28 53 


6-81- 49-90 


4 28- 57-82 



96% UmJU 



99% UmlU 



r lOOr/ii 10Q^ L 100^. 100* L 10O*„ 



5 
6 

7 

8 

9 
10 
11 
12 
13 
14 
16 
16 
17 



0 
1 
2 
S 
4 
6 
0 
7 
8 

10 
11 
12 
18 
14 
16 
16 
17 



n - 17 (oonttaiMKl) 



29 41 
36-29 
4118 
47-06 



04 71 
70 69 
70 47 
82 36 
88 24 
9412 
100 00 



000 
6-00 
10 00 
1600 
20 00 
25-00 
80-00 
86 00 
40 00 
46 00 
50 00 
65 00 
0000 
66-00 
7000 
76-00 
80-00 



10-31- 56 90 
14-21- 61-67 

18-44- 67 08 

22 98- 72 19 

27-81- 77 02 

32-92- 81 66 

38 33- 86 79 

44 04- 89 09 

60 10- 93 19 

60-67- 90 20 

68-60- 98 54 

71-31- 99 86 
80 49-100 00 



0 00- 

0 13- 

1 23- 
8 21- 
6 78- 
8 06- 

11-89- 
16-89- 
1912- 
28-00- 
27 20- 
81 58- 
86 05- 
40 78- 
46 72- 
60-90- 
60-34- 
6211- 



1084 
24-87 
81 70 
87 89 
43 66 
49 10 
54-28 
6922 
03 96 
88-47 
72-80 
70-94 
80 88 
84 81 
8811 
91 84 
94-27 
98-79 



6-97- 6810 
10-14- 68 40 
18-71- 73-44 
17 64- 78 07 
21-98- 82 30 
20 50- 80-29 
31 54- 89-88 
30 90- 93 03 
42 68- 96-74 
48 98- 97 91 
56 87- 99-37 
08-70- 99-97 
7322-100-00 



m - 18 


0 


0-00 


0-00- 18-63 


0 00- 26-60 


1 


650 


0-14- 27-29 


0-08- 84-68 




1111 


1-88- 84 71 


0-69- 42 17 


3 


10-07 


8-68- 41-42 


1-97- 48 84 


4 


22 22 


6-41- 47 64 


4-00- 54-92 


5 


27 78 


9 09- 68 48 


6-54- 60-66 


6 


88-88 


13-84- 69 01 


9-61- 66-79 


7 


8889 


17-80- 64 25 


12-84- 70-68 


8 


44-44 


21-68- 69-24 


16-49- 76 -20 


9 


50 00 


26 02- 73-98 


20-47- 79-68 


10 


66 50 


80 76- 78-47 


24 74- 88-61 


11 


61-11 


86-75- 82 70 


29-32- 8716 


12 


66-67 


40-99- 80 00 


84-21- 90 49 


IS 


72 22 


46-62- 90 81 


89-46- 98 40 


14 


77-78 


62 80- 93-69 


46-08- 90 00 


16 


83 S3 


58-68- 96 42 


61-16- 98 08 


16 


88-89 


86-29- 98 62 


67 88- 99-41 


17 


94-44 


72-71- 99 80 


66 87- 99-97 


18 


10000 


81 47-100 00 


74-6O-1O000 



11-19 


0 


000 


0-00- 17-64 


0-00- 24-34 


1 


6-20 


0-18- 20 OS 


0 08- 38 11 


2 


10-53 


1-80- 33-14 


0-66- 40-87 


8 


16 79 


8-88- 89 58 


1-88- 40-82 


4 


21 06 


6 06- 46 67 


8-78- 62-71 


6 


20 82 


9-16- 61 20 


BE SS 


6 


81-58 


12-68- 56-56 


7 


30 84 


10-29- 61-64 


12 07- 68 -09 


8 


42 11 


20 25- 00 60 


16-49- 72-60 


9 


47-37 


24 46- 71 14 


19-19- 76-84 


10 


52-68 


28-88- 76-56 


2818- 80-81 


11 


57-89 


S3 50- 79-76 


27-40- 84-61 


12 


63 16 


88 80- 88 71 


31-91- 87-98 


13 


08-42 


48 46- 87 42 


86-71- 91 04 


14 


73 08 


48 80- 90 85 


41-82- 93-83 


16 


78 96 


64-48- 98 96 


47-29- 96-22 


16 


84 21 


60 42- 9602 


53 18- 98-14 
69 03- 99-44 


17 


89-47 


66 80- 98 70 


18 
19 


100 00 


78 97- 99 87 
82 35-10000 


66 89- 99-97 
76-68-100-00 


n - 20 



000- 
0-03- 

0- 68- 

1- 76- 
3 68- 
5 88- 



11 89- 
14-60- 
18-06- 
21-77- 
26-72- 
29 91- 
84 84- 
89-04- 
44 02- 
4984- 



23 27 
31-71 
88-71 
44-96 
60 60 
55-98 
60 96 



70-09 
74 28 
78-23 
81-94 
86-40 
88 01 
91-64 
9417 
96 42 
98 24 



Copyrighted material 



400 



Tables 



•*% 



88% HmiU 



r 


100r/» 


IOO^l 100*0 10O#i. lOft^u 


n — 20 (continued) 


18 
10 

to 


Sfoo 

100-00 


68-80- 88-77 61-28- 88 47 
7518- 88 87 68 28- 8887 
83 16-100-00 78 7J- 100-00 






M - 21 



0 


0-00 


1 


4 76 


X 


8-62 


3 


14-29 


4 


18-06 


5 


28 81 


8 


28 67 


7 


S3 S3 


8 


38 10 


8 


42-86 


10 


47 62 


11 


52 88 


12 


67 14 


It 


61-80 


14 


86-67 


16 


71 48 


18 


78-18 


17 


80-86 


18 


86 71 


19 


80 48 


20 


86 24 


21 


100 00 



040- 16-U 
0-12- 23 82 
1 17- 80-88 
306- 36 34 
6 46- 4141 



8 22- 
11-28- 
14 58- 
18 11- 
21 82- 
26-71- 
28-78- 
84 02- 
88-44- 
430S- 
4782- 
52 88- 
58 08- 
63-66- 
68 62- 
7618- 



4717 
52 18 
68-87 
61 56 
86 88 
70 22 
74-28 
78 18 
81 88 

85 41 
88 72 
81-78 
84 55 

86 85 
88 83 
86-88 



67-72 
71 85 



83 88-100 00 



22 80 
80-43 

0 60- 87-18 
1-88- 48 22 
3 38- 48-76 
6-68- 63-82 
8-01- 68 78 
10-78- 63-37 
18-81- 
1747- 
20 56- 75-76 
24-24- 78 46 
28 15- 82 83 
32 28- 86 19 
86 63- 88 22 
41-22- 81 89 
48-08- 84 47 
61-24- 86-81 
66 78- 88 32 
62 82- 88 50 
68 67- 88 88 
77-70-100 00 



n - 22 



0 


0-00 


1 


4 55 


2 


949 


3 


1884 


4 


IS 18 


6 


22 78 


6 


2727 


7 


81 82 


8 


38 36 


9 


4041 


10 


46 45 


11 


6040 


12 


54 55 


18 


6808 


14 


68 84 


16 


68 18 


16 


72 78 


17 


77-27 


18 


81 82 


18 


86 88 


20 


8081 


21 


8646 


22 


100 00 



000- 16 44 

0 12- 22 84 

1 12- 28 16 

2 81- 34 91 

6 18- 40 28 

7 82- 45 37 
10 78- 60 22 
18 88- 54 87 
17 20- 58-84 
20 71- 68 65 
24 38- 67 79 
28 22- 71 78 
82 21- 76 61 
36 35- 79 29 
40 66- 82 80 
46-18- 86 14 
48-78- 88 27 
54 63- 82 18 
68 72- 84 81 
65 08- 97 08 
70-84- 88-88 
77 16- 89 88 
84-56-100 00 



040- 21 40 
0 02- 29 24 

0- 48- 85 77 

1- 80- 41 81 
8 28- 46 88 

6 28- 5241 

7 61- 66 74 
10 24- 81 23 
18 1 0- 66 49 
1818- 68 64 
18 48- 78 40 
22 88- 77 07 
26 60- 80 54 
30-46- 88 82 
34-61- 88 80 
88 77- 88 78 
43 26- 82-38 
47-88- 84-74 
6341- 86-77 
68 88- 88-40 
64 23- 89 62 
70-78- 98 88 
7860-10040 



n - 23 



0 


0 00 


1 


4'86 


2 


8-70 


8 


18 04 


4 


17 39 


6 


21-74 


6 


2608 


7 


SO 48 


8 


84 78 


8 


39 18 


10 


48 48 


11 


47 83 


12 


62 17 


18 


66 52 


14 


60 87 


15 


66 22 


16 


69 67 


17 


78 91 


18 


78 26 


18 


8261 


20 


88-88 


21 


81 80 


22 


8646 


28 


10040 



81 46 

86 61 
69 41 



040- 14 82 
0 11- 21 85 

1- 07- 2844 

2- 78- 83 59 
4-96- 88 78 
7-48- 43 70 

10 28- 48 41 
18-21- 62 92 
16 88- 67-27 
19 71- 
23 18- 
26 82- 
30 68- 78-18 
34 48- 76 81 
38-64- 80 29 
42-78- 83 62 
47 08- 86 79 
61-58- 88 77 
68 80- 82-64 
61 22- 86 06 
66-41- 87 22 
71-88- 98 93 
78 06- 99 89 
8618-10040 



0 00- 20 68 

0 02- 28 14 
0-46- 34 46 

1 58- 4012 
3 08- 46-34 
542- 60-22 

7- 26- 64 83 

8- 74- 68 21 
12 48- 63 88 
16 87- 67 86 
18-48- 71-16 
21-78- 74 79 
26-21- 78 24 
28 84- 81 62 
32 64- 84 63 
86 82- 87-64 
40-78- 80 26 
46 1 7- 82 76 
49-78- 84 88 
54 66- 86 92 
69 88- 88 47 
66 54- 88 64 
71-88- 89 98 
78-42-10040 







88%UmlU 


88%ltB*U 


f 


lOOrM 


100^ L lOQ^o 


10O# L 10Q6*„ 


■ - 14 


0 


040 


040- 14-28 


040- 18-81 


1 


4-17 


0-11- 21 14 


(Htt- 27- 18 


2 


8 83 


148- 2740 


0-44- S3 24 


3 


12 50 


248- 82-36 


1 46- 88-78 


4 


16 67 


4-74- 87-88 


246- 43 78 


6 


20 83 


7-18- 42-16 


4 78- 48 56 


I 


25-00 


8-77- 48-71 


8-82- 6344 


7 


2817 


12 62- 6148 


8-30- 67 82 


8 


83 S3 


15 68- 66 32 


11-88- 61-40 


A 


3 i 'WJ 




1 A A**.- AA VI 

1% IKr— w w 


10 


41 87 


22 11- 63 36 


17-68- 88-04 


n 


45 83 


25 56- 67 18 


20-7O- 72 62 


12 


5OO0 


28 12- 708m 


23 86- 76-04 


18 


64 17 


32 82- 74-45 


27 38- 7V SO 


14 


58 33 


36 64- 77 88 


30 88- 82-41 


16 


62 50 


40 58- 81-20 


84-70- 86 36 


16 


66-87 


44 68- 84-37 


8840- 88 12 


17 


70 83 


48-81- 87-38 


4248- 80 70 


18 


7500 


53 28- 90 23 


4848- 8348 


18 


7817 


57-86- 82 87 


61 46- 86-21 


20 


83 S3 


62 62- 86 26 


66-21- 8746 


21 


87 60 


67 64- 97 34 


61 27- 88-54 


22 


81 87 


7840- 88 97 


66-78- 88-68 


23 


9583 


78 88- 88 88 


72-78- 89 88 


24 


100 00 


8675-10040 


8018-100-00 



m - 26 


0 


000 


040- 13-72 


040- 19 10 


1 


4 00 


0 10- 20 86 


042- 28-18 


2 


800 


0 88- 2643 


0 42- 82 10 


8 


12 00 


2 65- 81 22 


1-40- 37-48 


4 


16 00 


4 64- 8648 


2 82- 42-85 


6 


20 00 


6 88- 40 70 


4 68- 48 88 


6 


24 00 


8 88- 46 13 


6-68- 61-36 


7 


28 00 


1247- 48 38 


8-88- 65-68 


8 


8200 


14 86- 68 50 


11-36- 58-62 


8 


88 00 


1747- 67 48 


18-88- 63-36 


10 


40-00 


2118- 61 88 


16-78- 6742 


11 


4440 


24 40- 8647 


18-74- 70-54 


12 


48 00 


27 80- 6848 


22 83- 7841 


18 


52 00 


81 81- 72 20 


26 07- 77 17 


14 


56 00 


84 93- 75 60 


28 46- 80-26 


15 


60 00 


38 67- 78-87 


32 88- 83-21 


16 


6440 


42-52- 82 03 


36 66- 86 01 


17 


68 00 


46 50- 8646 


40-48- 88 65 


18 


7240 


60-61- 87 93 


44 47- 81 11 


18 


78 00 


64 87- 80 64 


48 64- 83 37 


20 


8040 


68 80- 88 17 


6842- 95 41 


21 


84 00 


63 92- 85-46 


5745- 97 18 


22 


8840 


68 78- 87-45 


82-67- 88 60 


23 


8240 


78 87- 89 42 


87 80- 89 68 


24 


8840 


7945- 88 80 


7342- 89 98 


28 


10040 


86-28-10040 


8040-10040 


n - 26 


0 


000 


0 00- 18 28 


040- 18 44 


1 


8 86 


0-10- 19 64 


0-02- 26 28 


t 


748 


0-86- 25 13 


0 41- 8144 


1 


11-64 


2-45- 80 16 


1 84- 86 21 


4 


16 88 


4 86- 34 87 


2 71- 4140 


6 


1923 


645- 89 85 


4-40- 46 60 


6 


2848 


847- 48 66 


6-85- 49-77 


7 


26 82 


1147- 47-78 


8 62- 58-85 


8 


80-77 


14-88- 61-78 


10-87- 67-75 


8 


34-62 


17-21- 66-67 


18 88- 61-60 


10 


88-46 


20 23- 69 43 


16 06- 6610 


11 


42 31 


28-85- 63-08 


18-86- 68-57 


12 


46 16 


26-68- 66 68 


21-81- 7141 


IS 


6040 


29 98- 7047 


24 88- 7511 


14 


68 86 


S3 87- 78-41 


28 08- 7819 


16 


67 68 


38 92- 7846 


81-48- 8114 


16 


61-64 


40-67- 79-77 


8440- 88 96 


17 


8688 


44-88- 82 79 


88-50- 86 62 


18 


68 23 


48 21- 86 67 


42 26- 89 13 


18 


78 08 


62 21- 88 43 


46 16- 81-48 


20 


76 92 


56 36- 91 48 


60 28- 8848 


21 


8077 


60-85- 98 45 


54 40- 8540 


22 


8442 


66 18- 96-64 


6840- 87-28 



Copyrighted material 







96%Umtt. 


99% Umlto 


f 


lOOr/ft 


100^ L lOO^o 


100* L 100*„ 


ft - 20 (continued) 


23 


88 44 


69-85- 97 66 


08-79- 96-00 


24 


92-81 


74-87- 99-06 


06-96- 99 69 


25 


9616 


80 30- 99-90 


74-71- 99-98 


20 


100 00 


86 77-100 00 


81-56-100-00 


ft -27 


A 

0 


n.ftft 


yj is* 1 4 


A.fWX 1 AO 


1 


O /V 


n /Vk 1 0.07 
U vV— it) ¥/ 




o 

I 


f 


U VI— i4 IV 


V oir *HJ*VH 


■ 


1111 


9 QO. j A 
S 3 k>-* JfcV 1 D 




4 


14-81 


4 19- 83 73 


2 60- 39-73 


6 


18 62 


6 SO- 88 08 


4-23- 4411 


a 


22-22 


8-02- 42 26 


010- 48 28 


7 
8 


26-93 
29 88 


iwfc 50 18 


10-42- 66 08 


9 


33 S3 


10-62- 53 96 


12 83- 59 76 


10 


3704 


19-40- 67 08 


16 38- 63-28 


11 


40 74 


22 89- 01 20 


18 07- 06 69 


12 


44 44 


26 48- 04-07 


20 -88- 69-98 


IS 


48-16 


28 97- 68-06 


23 81- 78 14 


14 


51 86 


SI 96- 71 33 


26 86- 70 19 


16 


66 66 


86 83- 74-62 


SO 02- 79 12 


18 


69-20 


S8 80- 77-01 


83-81- 81-98 


17 


62 90 


42 87- 80-00 


86-72- 84-62 


18 


68-67 


48 04- 83-48 


40 26- 87 17 


19 


70-37 


49-82- 8026 


43-92- 89 58 


20 


7407 


68-72- 88-89 


47-74- 91 88 


21 


77-78 


67-74- 91 38 


61-72- 98 90 


22 


81-48 


01 92- 98 70 


56 89- 96-77 


£3 


86-19 


66 27- 96-81 


00 27- 97-40 


24 


88 89 


70 84- 97-06 


04 93- 96-71 


26 


92 59 


76-71- 09-00 


09 96- 99 01 


2< 


96 30 


81 OS- 99-91 


76 54- 99 98 


27 


100 00 


87 23-100 00 


82 18-100-00 


fi - 28 


0 


0-00 


0-00- 1234 


0K»- 17 24 


1 


8-67 


0 09- 18 86 


002- 28 69 


2 


7-14 


0 88- 23-60 


0 38- 29 11 


3 


10-71 


2 27- 28 23 


1-26- 83 99 


4 


14-29 


4 03- 82 67 


2-61- 88 68 


6 


17 86 


0-00- 36 89 


4-07- 42 80 


« 


21-43 


8 80- 40 95 


5-86- 40-87 


7 


25 00 


10-09- 44-87 


7-86- 50-70 


8 


28-67 


18-22- 48 67 


10 02- 64 49 


9 


32- 14 


16-88- 52 36 


12 82- 58 08 


10 


86-71 


18-04- 56 98 


14-77- 01 66 


11 


89 29 


21 60- 69 42 


17 33- 04 90 


12 


42-86 


24-40- 62-82 


20-02- 08 14 


IS 


40-48 


27 61- 00 18 


22 82- 71 20 


14 


50 00 


80-65- 69 86 


26 72- 74 28 


16 


63 67 


38 87- 72-49 


28-74- 77-18 


10 


6714 


87 18- 76 54 


31 86- 79-98 


17 


00-71 


40-58- 78-50 


36 10- 82 67 


18 


04 29 


44 07- 81-30 


38 45- 86-23 


19 


67-86 


47-06- 84 12 


41-92- 87-08 


20 


71-43 


51-83- 86 78 


46-61- 89 98 


21 


7500 


66 1 8- 89 31 


49 2 4- 92 14 


22 


78-67 


69 05- 91-70 


63 13- 94 14 


23 


82 14 


08 11- 98 94 


57-20- 95 93 


24 


86-71 


07 83- 96-97 


01-47- 97 49 


26 


89 29 


71-77- 97-78 


66-01- 96 76 


26 


92 86 


76 50- 9912 


70-89- 99 62 


27 


96-48 


81-06- 99-91 


70 31- 99 98 


28 


100 00 


87 00-100 00 


82 76-100 00 


n - 29 


0 


000 


000- 11-94 


0-00- 10-70 


1 


8-45 


009- 17-70 


0-02- 22 96 


2 


6-90 


0-86- 22-77 


0 SO- 28 28 


3 


10 34 


2 19- 27 86 


1-20- S2 98 


4 


18 79 


S 98- 81-00 


2 42- 87 40 


6 


17 24 


6-85- 36-77 


8 92- 41 67 


« 


20-09 


7-99- 89 72 


6 66- 46 64 


7 


2414 


10-80- 48-54 


7-68- 4*83 


8 


2769 


12 73- 47-24 


9 04- 62 99 


9 


81-03 


16-28- 60 88 


11 86- 66-61 


10 


84-48 


17-94- 54 S3 


14 20- 69-91 



Tables 401 

96% Umlto 99% Uaito 

r lOOr/M IOO^d lOOPo IOO'l 100# o 



ft — 29 (oonUnuod) 



11 


87-93 


20-69- 


67-74 


10 06- 68-20 


12 


41 88 


23 62- 01 06 


19-28- 00 88 


IS 


44-83 


20-46- 


64-81 


21-91- 69-46 


14 


48-28 


29-45- 


67-47 


24-09- 72-48 


16 


61-72 


82 63- 


70-55 


27-57- 76-81 


10 


6617 


36-69- 


73-66 


80-54- 78 09 


17 


58 02 


88-94- 


76-48 


83-82- 80-77 


18 


6207 


42 26- 79 81 


36 80- 88-84 


19 


65 62 


46-07- 


82 06 


40-09- 86-60 


20 


68-97 


49-17- 


84-72 


43 49- 88 15 


21 


72-41 


62-76- 87-27 


47-01- 90 30 


22 


76-80 


60-46- 89 70 


50 07- 92-44 


23 


79-81 


00-28- 


92-01 


64 40- 94-36 


24 


82-70 


04-23- 


94-16 


68-48- 96 08 


26 


80-21 


68-34- 96-11 


02-00- 97-68 


20 


89-66 


72-06- 97-81 


67-02- 98-90 


27 


93-10 


77-28- 99-16 


71-77- 99-04 


28 


90-66 


82-24- 99 91 


77-04- 99-98 


29 


10000 


88-06-1 


10000 


83 80-100-00 


ft-80 


0 


000 


0-00- 


11-57 


0-00- 10-19 


1 


a 38 


0-08- 


17-22 


0-02- 22-27 


2 


007 


0-82- 


22 07 


0-35- 27 40 


3 


1000 


2 11- 


20-63 


116- 32-08 


4 


18 M 


8-76- 


30-72 


2-83- 80 84 


5 


1007 


6 04- 


S4-72 


8-78- 40-40 


0 


2000 


7-71- 


88-57 


6-45- 44-28 


7 


23-83 


993- 


42 28 


7-29- 47-99 


8 


20 07 


12-28- 


46 89 


9-29- 51-66 


• 


80 00 


14-73- 


49-40 


11 42- 66-01 


10 


83-83 


17 29- 


62 81 


18-07- 68-34 


11 


36 67 


19-93- 


6014 


10 04- 01-67 


12 


40-00 


22-66- 


59-40 


18-50- 04-70 


IS 


48-88 


26-46- 


02 57 


21-07- 07-78 


14 


40-07 


28-34- 


66 07 


23 73- 70-07 


16 


60-00 


31-80- 


68-70 


20-48- 78-62 


10 


58-83 


84 83- 


71-86 


29-83- 70-27 


17 


60-07 


87-43- 


74 54 


82-27- 78-93 


18 


0000 


40-60- 


77-34 


86 30- 81-60 


19 


63 33 


43-80- 


80-07 


38 43- 83-90 


20 


6607 


47-19- 


82 71 


41 06- 86-83 


21 


70 OO 


60-60- 


86 27 


44-99- 88 68 


22 


78 88 


5411- 


87-72 


48-44- 90-71 


23 


70 07 


67-72- 


90 07 


62-01- 92 71 


24 


80-00 


01-48- 


92-29 


66 72- 94-66 


26 


83 S3 


65 28- 


94-36 


59-60- 96-22 


20 


86-67 


69 28- 


96-24 


63-06- 97-07 


27 


9000 


78-47- 


97-89 


07-97- 98 84 


28 


93 88 


77-93- 


99 18 


72-00- 99-05 


29 


9607 


82-78- 


99-92 


77-73- 99-98 


80 


100-00 


88-43-100O0 


88-81-100-00 



n - 1000 (extract) 



0 


0 00 


0-90- 0-37 


040- 9$t 


1 


0 10 


0 60- 0 se 


0 00- 0 74 


2 


0 to 


00t- 9-7t 


0 01- 0 8? 


3 


0 to 


0 06- 0-87 


0 03- 109 


4 


0*0 


0-11- l ot 


0 07- l iS 


6 


0 50 


010- 1-17 


0 09- 1-89 


6 


0 60 


0 22- 1-31 


0-14- 1-64 


7 


0 70 


0 26- 1-44 


018- 109 


8 


0 80 


0 34- 1-68 


0-24- 1 83 


9 


0 00 


0-41- 1-71 


0 29- 1-97 


10 


1-00 


0 48- 1 84 


0 36- 211 


11 


110 


0 56- 1-97 


0-41- 2 26 


12 


1-20 


0 02- 2 09 


0-48- 2 88 


18 


1-80 


0 69- 2-22 


0-54- 2 51 


14 


1-40 


0-77- 2 84 


0-60- 2-06 


16 


1 60 


0-84- 2-47 


0 07- 2-78 


16 


160 


0 92- 2-69 


0-74- 2-91 


17 


1-70 


0-99- 2-71 


0-81- 8 04 


18 


1 80 


1-07- 2-84 


0 88- 8-17 


19 


1 90 


116- 2 90 


0 95- 3 80 


20 


2-00 


1 22- 8-08 


1 02- 8 42 


■ 




• 


• 


• 


■ 


• 


• 


■ 




• 


• 



Copyrighted material 



Table A3 



The Wilcoxon test for two independent samples 

See 5 0.3. The sample sizes axe n x and n 2 . If the sample sizes are not equal « x 
is taken as the smaller. If the rank sum for sample 1 (that with n x obs er vations) 
is equal to or Lean than the smaller tabulated value, or equal to or gre a t er than the 
larger tabulated value, then P (two tail) is equal to or leas than the figure at the 
head of the column. If the null hypothesis were true P would be the probability 
of observing a rank sum equal to or greater than the larger figure, or equal to or 
leas than the smaller. If one or both samples contain more than 20 observations, 
use the method described at the end of § 9.3. M. I. Sutclifle's table reproduced 
from Mainland (1963) by permission of the author and publisher. 



Tit 
'•1 






P (approx.) 

* — K Mr ' 


0 01 


P (approx.) 


n-a 


0-10 


0-05 




*H "* 


0-10 


0-05 


0-01 


2 


4 








3 18 


16; 61 


13; 53 


8; 58 




5 


3; 13 







19 


16; 53 


13; 56 


9; 60 




6 


3; 15 







20 


17; 55 


14; 58 


9; 63 




7 


3; 17 


















g 


4; IB 


3; 19 





4 4 


11; 25 


10; 26 















6 


12; 28 


11; 29 






9 


4; 20 


3; 21 




6 


13; 31 


12; 32 


10; 34 




10 


4; 22 


3; 23 





7 


14; 34 


13; 35 


10; 38 




11 


4; 24 


3; 25 





8 


15; 37 


14; 38 


11; 41 




12 


6; 25 


4; 26 















13 


5; 27 


4; 28 





9 


16; 40 


14; 42 


ft a a* 

U;45 












10 


17; 43 


15; 45 


12; 48 




u 


6; £8 


4; 30 





11 


18; 46 


16; 48 


12; 62 




15 


6; 30 


4; 32 





12 


19; 49 


17; 51 


13; 55 




16 


6; 32 


4; 34 




13 


20; 62 


16; 54 


13; 69 




17 


6; 34 


5; 35 














18 


7; 35 


5; 37 




14 


21; 55 


19; 57 


14; 62 












16 


22; 68 


20; 60 


15; 66 




19 


7; 37 


5; 39 


3; 41 












20 


7; 39 


6; 41 


3; 43 


16 


24; 60 


21; 63 


16; 69 












17 


26; 63 


21; 67 


16; 72 


3 


3 


6; 16 






18 


26; 66 


22; 70 


16; 76 




4 


6; 18 
















5 


7; 20 


6; 21 




19 


27; 69 


23; 73 


17; 79 




6 


8; 22 


7; 23 




20 


28; 72 


24; 76 


18; 82 




7 


8; 26 


7; 26 






















5 5 


19; 36 


17; 38 


16; 40 




8 


9; 27 


6; 26 




6 


20; 40 


IB; 42 


16; 44 




9 


10; 29 


8; 31 


6; 33 


7 


21 ; 44 


20; 45 


16; 49 




10 


10; 32 


9; 33 


6; 36 


8 


23; 47 


21; 49 


17; 53 




11 


11; 34 


9; 36 


6; 39 


9 


24; 61 


22; 53 


18; 67 




12 


11; 37 


10; 38 


7; 41 




















10 


26; 54 


23; 57 


19; 61 




13 


12; 89 


10; 41 


7;44 


11 


27; 68 


24; 61 


20; 65 




14 


13; 41 


11; 43 


7; 47 


12 


28; 62 


26; 64 


21; 69 




15 


13; 44 


11; 46 


8; 49 


13 


30; 66 


27; 68 


22; 73 




16 


14; 46 


12; 48 


8; 52 


14 


31; 69 


28; 72 


22; 78 




17 


15; 48 


12; 61 


8; 56 











i 



Tables 403 



P (approx.) 




010 


006 


001 


IK 
10 


Mi /Z 


9Q. 7 A 


91 • fi9 

zo ; oz 


16 


34; 78 


30; 80 


24; 86 


17 


86; 80 


32; 83 


26; 90 


18 


37; 83 


33; 87 


28:94 


19 


38; 87 


34; 91 


27; 98 


fin 




OK . nc 
OO , VO 


9B- in" 
zo; iuz 


6 6 


28; 60 


26; 62 


23; 66 


7 


29; 66 


27; 67 


24; 60 


8 


31; 69 


29; 61 


26; 86 


e 


33; 63 


31; 66 


26; 70 


10 






97 • 7K 


11 


37; 71 


34; 74 


28; 80 


12 


38; 76 


36; 79 


30; 84 


13 


40; 80 


37; 83 


31; 89 


14 


42; 84 


38; 88 


32; 94 


la 
10 


AA- Bfl 

»*; oo 


AO- 09 


OO 
66, VV 


16 


46; 92 


42; 96 


34; 104 


17 


47; 97 


43; 101 


36; 108 


18 


49; 101 


46; 106 


37; 113 


19 


61; 106 


46; 110 


38; 118 




o j , ly/v 


4ft • 1 1 A. 


jy, iz3 


7 7 


39; 66 


36; 69 


32; 73 


8 


41; 71 


38; 74 


34; 78 


9 


43; 76 


40; 79 


36; 84 


in 

1U 


AR- SI 
*o, Ol 


19- flA 
*Z, O* 


1~ ■ AO 

J ' , ow 


11 


47; 86 


44; 89 


38; 96 


12 


49; 91 


46; 94 


40; 100 


13 


62; 96 


48; 99 


41; 106 


14 


64; 100 


60; 104 


43; 111 


16 


66; 106 


62; 109 


44; 117 


16 


68; 110 


64; 114 


46; 122 


17 


61; 114 


66; 119 


47; 128 


18 


63; 119 


68; 124 


49; 133 


19 


66; 124 


60; 129 


60; 139 


20 


67; 129 


62; 134 


62; 114 


8 8 


61; 86 


49; 87 


43; 93 


9 


64; 90 


61; 93 


46; 99 


10 


68; 96 


68; 99 


47; 106 



»1 «9 



10 



11 



13 
14 
16 
16 
17 



9 
10 
11 
12 
13 

14 

16 
16 
17 
18 

19 

20 

10 
11 
12 
13 
14 

16 
16 
17 
18 
19 



010 



11 69; 101 

12 82; 108 



P (appro*.) 

006 

66; 106 
68; 110 



64; 112 
67; 117 
69; 123 
72; 128 
76; 133 



66; 106 
89; 111 
72; 117 
76; 123 
78; 129 

81; 136 
84; 141 
87; 147 
90; 163 
93; 169 

96; 165 
99; 171 

82; 128 
86; 134 
89; 141 
92; 148 
96; 164 

99; 161 
103; 167 
106; 174 
110; 180 
113; 187 



60; 116 
62; 122 
66; 127 
67; 133 
70; 138 



18 77; 139 72; 144 

19 80; 144 74; 160 

20 83; 149 77; 166 



62; 109 
66; 116 
66; 121 
71; 127 
73; 134 

76; 140 
79; 146 
82; 162 
84; 169 
87; 166 

90; 171 
93; 177 

78; 132 
81; 139 
84; 146 
88; 162 
91; 169 

94; 166 
97; 173 
100; 180 
103; 187 
107; 193 



11 100; 163 

12 104; 160 

13 106; 167 



96; 167 
99; 166 

103; 172 



001 

49; 111 
61; 117 

63; 123 
64; 130 
68; 136 
68; 142 
60; 148 

82; 164 
64; 180 
66; 166 

66; 116 
68; 122 
61; 128 
63; 136 
66; 142 

67; 149 
69; 166 
72; 162 
74; 169 
76; 176 

78; 183 
81; 189 

71; 139 
73; 147 
76; 164 
79; 161 
81; 169 

84; 176 
86; 184 
89; 191 
92; 198 
94; 206 



20 117; 193 110;200 97; 213 



87; 166 
90; 174 
93; 182 



Copyrighted material 



404 Tables 



«1 


n 2 


u. I u 


P (appro i.) 






n* 




P (approx.) 






0-05 


001 


010 


005 


001 


11 


14 


112; 174 


106 


180 


96; 190 


14 


19 


i rt. rt rt rt a 

192; 284 


% rt rt rt rt n 

183; 293 


1 rt rt rtrtrt 

168; 308 




1ft 


1 16; 181 


110 


187 


99; 198 




20 


1 rt M rt rt rt 

197; 293 


■ rtrt rtrtrt 

188; 302 


172; 318 




16 


■ A A * A~\ t \ 

120; 188 


113 


196 


V rt rt rt A A 

102; 206 


16 


15 


v rt rt rt w rt 

192; 273 


184; 281 


171 ; 294 




17 


-i I 1 rt 1 n/i 

123; 196 


117 


202 


105; 214 




10 


* rt rt rt rt rt 

197; 283 


V A/\ /%rt rt 

1 90 ; 290 


176; 305 




18 


127; 203 


121, 


209 


i rtrt rt rt rt 

108; 222 




17 


rt rt rt rt A rt 

203; 292 


1 r A rtrt 

196; 300 


1 rt A ft y r 

180; 315 




19 


131 ; 210 


124, 


217 


111 ; 230 




18 


rtrtrt rtrtrt 

208; 302 


A AA A % rt 

200; 310 


i o a rt A a 

184; 326 




20 


135; 217 


128; 


224 


114; 238 




19 


214; 311 


A rt ■* AAA 

205; 320 


1 A rt A A a 

189; 336 


12 


12 


120; 180 


116; 


185 


106; 195 




20 


rtrtrt rtn/\ 

220; 320 


rt f rt. AAA 

210; 330 


193; 347 




13 


125; 187 


119; 


193 


109; 203 














14 


129; 195 


123; 


201 


112; 212 


i rt 

16 


18 


rt \ rt rt rt. rt 

219; 309 


211 ; 317 


• Art AAA 

196; 332 




15 


■ A A rtrtrt 

133; 203 


127; 


209 


115; 221 




17 


225; 319 


A 1 *T A A *t 

217 ; 327 


A rt 1 A A rt 

201 ; 343 




10 


138; 210 


131; 


217 


119; 229 




18 


Art! rtrtrt 

231 ; 329 


AAA AAA 

222; 338 


rt A A A V A 

206; 354 
















19 


237; 339 


228; 348 


210; 366 




17 


142; 218 


135; 


225 


122; 238 




20 


243; 349 


234; 358 


216; 377 




18 


148; 226 


139; 


233 


125; 247 














19 


150; 234 


143; 


241 


129; 266 


17 


17 


rt At rt rt j rt 

249; 346 


240; 356 


rt rt m rt w rt 

223; 372 




20 


155; 241 


147; 


249 


132; 264 




18 


255; 357 


rt a rt rtrtrt 

246; 366 


rtrtrt Art a 

228; 384 
















19 


rt A rt rtrtrt 

262 ; 367 


252 ; 377 


A A A AAA* 

234; 395 


13 


13 


1 A t\ AAA 

142; 209 


136; 


215 


• rt r a a a 

125; 226 




20 


268; 378 


: I r o rt A u 

258; 388 


rtrtrt A AW 

239; 407 




14 


147; 217 


141; 


223 


t rt rt ,l n - 

129; 235 














15 


152; 226 


145; 


232 


133; 244 


18 


18 


rtrtrt rtrtrt 

280; 386 


rt rt nrtrt 

270; 396 


252; 414 




16 


156; 234 


150; 


240 


136; 254 




19 


rt rt rt rt 

287; 397 


277; 407 


rt m- rt a rt rt 

258; 426 




17 


161; 242 


164; 


249 


140; 263 




20 


294 j 408 


283; 419 


263; 439 




18 


166; 250 


158; 


258 


144; 272 


19 


19 


313; 428 


303; 438 


283; 458 




19 


171; 268 


163; 


266 


147; 282 




20 


320; 440 


309; 451 


289; 471 




20 


175; 267 


167; 


275 


151; 291 
























20 


20 


348; 472 


337; 483 


315; 505 


14 


14 


166; 240 


160; 


246 


147; 269 














15 


171; 249 


164; 


256 


151; 269 














16 


176; 268 


169; 


265 


155; 279 














17 


182; 266 


172; 


276 


159; 289 














18 


187; 275 


179; 


283 


163 ; 299 













Table A4 



The Wilcoxon signed ranks test for two related samples 

See § 10.4. The number of pairs of observations is n. The table gives the values 
of T (defined as the sum of positive ranks, or the sum of negative ranks, which- 
ever is the smaller) for various values of P (the probability of a value of T equal 
to or less than the tabulated value if the null hypothesis is true). If there are 
more than 25 pairs of observations, use the method described at the end of f 10.4. 
Adapted from Wilcoxon and Wilcox (1964), with permission. 



P value (two tail) 



n 


006 


002 


001 




tor p 


— 012M 




6 


t0(P 


= 0 0626) — 


— 


6 


0 


— 


— 


7 


2 


0 


— 


8 


4 


2 


0 


9 


6 


3 


2 


10 


8 


5 


3 


11 


11 


7 


6 


12 


14 


10 


7 


13 


17 


13 


10 


14 


21 


16 


13 


16 


26 


20 


16 


16 


30 


24 


20 


17 


36 


28 


23 


18 


40 


33 


28 


19 


46 


38 


32 


20 


62 


43 


38 


21 


69 


49 


43 


22 


66 


66 


49 


23 


73 


62 


66 


24 


81 


69 


61 


26 


89 


77 


68 



t It is not possible to reach a value of P &a small as 0 06 with auch small samples 
(see §§ 6.2 and 10.4). The values of P for T as 0 are given. 



Copyrighted material 



Table A6 

The Kruskal- Wallis one way analysis of variance on ranks {independent 

samples) 

See } U.S. For each value of H, the table gives the exact value of P (the 
probability of observing a value of H equal to or greater than the tabulated value 
if the null hypothesis is true, found from the randomization distribution of rank 
sums). This table deals only with k — 3 groups, the number of observations 
(*ii "a* and ng) in each being up to 5. For larger samples or more groups use the 
method described at the end of | 11.5. From Kruskal and Wallis (1962, J. Amer. 
Statist. Ass. 47, 614; 48, 910) with permission of the author and publisher. 



Sam 


pie til 




H 


P 


Sample aiz. 




H 


P 


»h 


«• 


n. 




»i 




n, 




2 


1 


1 


2-7000 


0-600 


4 


2 


1 


4-8214 


0 057 


















4-6000 


0076 


2 


2 


1 


3-6000 


0-200 








40179 


0-114 


2 


2 


2 


4-67W 


0067 


















3-7143 


0-200 


4 


2 


2 


6 0000 


0-014 


















6-3383 


0-033 


















61260 


0062 


3 


1 


1 


3-2000 


0-300 








4-4683 


0100 


















41667 


0-105 


3 


2 


1 


4-2857 


0100 


















3-8671 


01 33 


4 


3 


1 


6-8333 


0-021 


















6-2083 


0-060 


3 


2 


2 


5-3572 


0029 








6-0000 


0-067 








4-7143 


0048 








4-0666 


0093 








4-6000 


0-067 








3-8889 


0129 








4-4643 


0105 






















4 


3 


2 


6-4444 


0-008 


3 


3 


1 


6- 1429 


0043 








6-3000 


0-011 








4-6714 


0-100 








6-4444 


0-046 








4 0000 


01 29 








5-4000 


0061 


















4-5111 


0098 


3 


3 


2 


6-2600 


0 011 








44444 


0-102 








6 3611 


0-032 


















5-1889 


0061 


















4-6666 


0100 


4 


3 


3 


6-7466 


0010 








4-2600 


0121 








6-7091 


0013 


















5-7909 


0046 


3 


3 


3 


7-2000 


0-004 








6-7273 


0050 








6-4889 


0011 








4-7091 


0-092 








6-6889 


0029 








4-7000 


0101 








6-6000 


0060 


















60667 


0 086 


4 


4 


1 


6-6687 


0010 








4-6222 


0100 








6- 1667 


0 022 


















4-9667 


0048 


4 


1 


1 


3-6714 


0-200 








4-8667 


0-054 



Copyrighted material 



Tables 407 



Sample sizea 



n x fig ft. 



H 









4 1667 


0 082 








4 0667 


0-102 


4 


4 


2 


70364 


0 006 








6-8727 


0-011 








6-4646 


0-046 








6.2364 


0-052 








4-6646 


0098 








4-4466 


0103 


4 


4 


3 


71439 


0010 








71364 


0011 








6-6986 


0049 








6 6768 


0051 








4-6466 


0-099 








4-4773 


0.102 


4 


4 


4 


7-6638 


0-008 








7-5386 


0011 








6-6923 


0049 








6-6638 


0-054 








4 6639 


0097 








4-6001 


0104 


6 


1 


1 


3-8671 


0-143 


5 


2 


1 

* 


6-2600 


0-036 








6-0000 


0-048 








4-4600 


0071 








4-2000 


0-095 








40600 


0-119 


5 


2 


2 


6-6333 


0 008 








6-1333 


0-013 








6- 1600 


0034 








50400 


0-066 








4-3733 


0090 








4-2933 


01 22 


5 


3 


1 


6-4000 


0-012 








4-9600 


0-048 








4-8711 


0052 








40178 


0095 








3-8400 


0-1 23 


5 


3 


2 


6-9091 


0009 








6-8218 


0010 








6-2609 


0 049 








5-1066 


0062 








4-6509 


0 091 








4 4946 


0-101 



Sample sizes 



»i «a n, 



H 



7 0788 
6-9818 
5-6485 

5- 5152 
4-5333 
4-4121 

6- 9545 

6- 8400 
4-9865 

4- 8600 
3-9873 

3- 9600 

7- 2045 
71182 

5- 2727 

5- 2682 

4- 6409 
45182 

7-4449 
7-3949 

6- 6564 

5- 6308 
4-5487 
4-6231 

7- 7604 
7-7440 

6- 6571 

6- 6178 
4-6187 
4-5527 

7- 3091 

6- 8364 
61273 

4- 9091 
41091 
40364 

7- 3385 
7-2692 

5- 3385 
5-2462 
4-6281 
4-6077 



0009 
0011 
0 049 
0051 
0-097 
0109 

0008 
0011 
0 044 
0056 
0098 
0102 

0 009 
0010 
0049 
0 050 
0 098 
0101 

0010 
0011 
0-049 
0050 
0 099 
0103 

0 009 
0011 
0049 
0050 
0100 
0102 

0009 
0011 
0 046 
0053 
0-086 
0105 

0010 
0010 
0047 
0 061 
0097 
0100 



Copyrighted material 



408 Tables 



a 1 — * 

sample m 




WW 

H 


P 


Sample sixes 


s 


— 

P 


n, n, 






»i "l *l 


5 0 


3 


7-5780 


0-010 




5-6429 


A AHA 

0060 






7-5429 


0-010 




i v A An 

4-5229 


0099 






5-7065 


0-046 




4-5200 


0101 






66264 


0051 












4-5451 


0100 


6 5 5 


8 0000 


0-009 






45363 


0102 




7-9800 


0010 












5-7800 


0049 


5 6 


4 


7-8229 


0010 




5-6600 


0-051 






77914 


0010 




4-6600 


0-100 






5-6657 


0049 




4-5000 


0-102 



Copyrighted material 



Table A6 



The Friedman two way analysis of variance on ranks for randomized block experiments 

See 5 11.7. For each value of S the table gives the exact value of P (the probability of observing a value of S equal to or greater 
than the tabulated value if the null hypothesis is true, found from the randomization distribution of rank sums). Approximate P 
values are given at the head of the column. If the number of treatments, k, or the number of observations per treatment ■= number 
of blocks, n, is too large for this table, use the method described at the end of § 11.7. From Friedman, M. (1937, J. Amer. Statist. Ass. 
31, 688), by permission of the author and publisher. 





1-3 


Number of treatments 
1 -4 


1 - 5 


No. of 
n 


P=*006 FaOOl F^rOOOl 
S P S P S P 


F^005 FsO-OI F = 0 001 
S P S P S P 


P = 0 05 P~0-01 F~0 001 
S P S P S P 


10 


18 0-OS8 — — — — 
28 0-042 3S 00046 — — 
32 0-038 42 0-0088 B0 0 0008 
42 0-029 54 0-0081 72 00001 
50 0-027 62 0 0084 8« 00003 
50 0-047 72 0-0099 98 00009 
88 0-048 78 0-0100 114 0 0007 
62 0-048 98 0 0076 128 0 0008 


20 0 042 — — — — 
37 0-033 — — — — 
62 0-036 84 0 0069 74 0 0009 
85 0-044 83 00087 106 0-0008 
76 0-043 100 0 0100 128 0 0009 


64 0 045 76 0-0078 86 0 0009 



Copyrighted material 



Table A7 

Table of the critical range (difference between rank sums for any two 
treatments) for comparing all pairs in the Kruskal-WaUis nonparametric 
one way analysis of variance {see §§ 11.5 and 11.9) 

Values for which an exact P is given are abridged from the tables of McDonald 
and Thompson (1067), the remaining values are abridged from Wilcox un and 
Wilcox (1064). Reproduction by permission of the authors and publishers. fNot 
attainable. Number of treatments (samples) == k. Number of observation (repli- 
cates) per treatment — n. 









P(*Ppr 


oximate) 










P(appn 


oximate) 








001 


006 






0-01 


005 






orit. 




orit. 








orit. 




orit. 




k 


n 


range 


P 


range 


P 


k 


n 


range 


P 


range 


P 


m 
0 




t 




8 


0 067 


Jr 

o 


<> 
2 


16 


0016 


16 


0048 






17 


0-011 


15 


0-064 




3 


32 


0 007 


28 


0 060 




4 


27 


0011 


24 


0-045 




4 


60 


0010 


44 


0056 






39 


0 009 


33 


0048 




5 

IP 


76-8 




635 






a 


61 


0011 


43 


0049 




A 

w 


993 




832 






m 


67-6 




54-4 






7 


124-8 




104-6 








82-4 




663 






8 


152-2 




127-6 






Q 


98 1 




78-9 






9 


181-4 




1520 






10 

S W 


114-7 




92-3 






10 


212-2 




177-8 






11 

* * 


1321 




106-3 






11 

* * 


2446 




2050 






12 


160-4 




1209 






12 


278-5 




2334 






13 


1694 




136-2 






13 

a kj 


313-8 




2630 






14 


1891 




1521 






14 


360-5 




293-8 






15 


2096 




168-8 






16 


388-5 




326-7 






IS 


230-7 




185-6 






16 

a w 


427-9 




358-6 






17 


252-6 




203- 1 






17 

a 1 


468-4 




392-6 






a o 


275 0 




221-2 






18 

a o 


510-2 




427-6 






10 

a V 


298-1 




239-8 






19 

a v 


553-1 




463-6 






20 


321-8 




258-8 






20 

aw 


697-2 




500-5 




4 


2 


t 




12 


0029 


6 


2 


20 


0010 


19 


0030 




3 


24 


0012 


22 


0043 




3 


39 


0-009 


36 


0056 




4 


38 


0012 


34 


0049 




4 


67-3 




570 






5 


68-2 




48 1 






5 


93-6 




79-3 






6 


76 3 




62-9 






6 


122-8 




104-0 






7 


95-8 




79- 1 






7 


164-4 




130-8 






8 


116-8 




964 






8 


188-4 




159-6 






9 


139-2 




114-8 






9 


224-5 




190-2 






10 


1628 




134-3 






10 


262-7 




222-6 






11 


187-6 




164-8 






11 


302-9 




256-6 






12 


213-5 




176-2 






12 


344-9 




292-2 






13 


240-6 




198-5 






13 


388-7 




329-3 






14 


268-7 




221-7 






14 


434-2 




367-8 






16 


297-8 




245-7 






16 


481-3 




407-8 






16 


327-9 




270-6 






16 


530- 1 




449- 1 






17 


359-0 




296-2 






17 


580-3 




491-7 






18 


3910 




322-6 






18 


632 1 




535-5 






10 


423-8 




349-7 






19 


686-4 




680*6 






20 


467-6 




377-6 






20 


740 0 




626-9 





Copyrighted material 



Table A8 



Table of the critical range (difference between rank sums for any two 
treatments) for compairing all pairs in the Friedman nonparameiric two 
way analysis of variance (see §§11.7 and 11.9) 

Values for which an exact P is given are abridged from McDonald and 
Thompson (1967), the remaining values are abridged from Wilcoxon and Wilcox 
(1964). Reproduction by permission of the authors and publishers. tXot attain- 
able. Number of treatments = k. Number of replicates ( = number of blocks) =n. 









P (approximate) 








P (approximate) 






001 


005 






001 


006 






crit. 




crit. 








crit. 




crit. 




k 


n 


range P 


range 


P 


k 


n 


range 


P 


range 


P 


3 


■i 


t 

1 




6 


0-028 


m 
O 


A 
i 


t 




8 


0-050 




4 


8 


0 005 


7 


0 042 




3 


12 


0 002 


10 


0 067 




0 


9 


0 008 


8 


0039 




4 


14 


0 006 


12 


0054 




6 


10 


0009 


9 


0-029 




6 


16 


0-006 


14 


0-040 




7 


11 


0 008 


9 


0051 




6 


17 


0-013 


15 


0 049 




8 


12 


0 007 


10 


0039 




7 


19 


sv Ann 

0009 


16 


A ASF A 

0052 




9 


12 


0 013 


10 


0 048 




8 


20 


0012 


18 


0036 
















9 


22 


0 008 


19 


0037 




10 


13 


0010 


1 1 


0 037 




10 


23 


0009 


An 

20 


0038 






14 


0 008 


11 


0 049 






24 


0-010 


21 


0-038 




12 


14 


00 12 


12 


0038 




12 


26 


0-011 


22 


0-038 




13 


15 


0 009 


12 


0 049 




13 


26 


n nl 1 

0-011 


A A 

23 


A A A m 

0035 




14 


16 


0 007 


13 


0 038 




14 


27 


0-011 


A A 

24 


A A A A 

0 034 




15 


16 


0010 


13 


0 047 




15 


28 


0010 


24 


0045 




16 


16-6 




13-3 






16 


291 




A A M 

24-4 






17 


170 




13-7 






17 


300 




26-2 






18 


1 /•«> 




141 






18 


30-9 




25-9 






19 


180 




14-4 






19 


31-7 




A A n 

26 6 






20 


18-4 




148 






20 


32-5 




27-3 




4 


2 


t 




6 


0 083 


6 


2 


t 




10 


0033 




3 


9 


0007 


8 


0-049 




3 


14 


0008 


13 


0 030 




4 


11 


0005 


10 


0-026 




4 


17 


0 006 


16 


0047 




6 


12 


0013 


11 


0037 




5 


19 


0010 


17 


0047 




6 


14 


0 006 


12 


0037 




6 


21 


0010 


19 


0040 




7 


15 


0 008 


13 


0037 




7 


23 


0010 


20 


0049 




8 


16 


0009 


14 


0034 




8 


25 


0 008 


22 


0039 




9 


17 


0010 


15 


0032 




9 


26 


0012 


23 


0 043 




10 


18 


0010 


15 


0 048 




10 


28 


0009 


24 


0047 




11 


19 


0 009 


16 


0041 




11 


29 


0012 


26 


0036 




12 


20 


0008 


17 


0038 




12 


31 


0 009 


27 


0039 




13 


21 


0008 


18 


0032 




13 


32 


0010 


28 


0039 




14 


21 


0-011 


18 


0 042 




14 


33 


0011 


29 


0-040 




15 


22 


0010 


19 


0037 




15 


34 


0012 


30 


0 040 




16 


22-7 




19 






16 


36-6 




30-2 






17 


23-4 




193 






17 


36-7 




31-1 






18 


241 




19 9 






18 


37-8 




320 






19 


24-8 




20 4 






19 


38-8 




329 






20 


26-4 




210 






20 


398 




337 





j8 



Copyrighted material 



Table A9 



Rankits {expected normal order statistics) 

The use of Rankits to test for a normal (Gaussian) distribution is described 
in § 4.0. The observations are ranked, the rankit is found from the table, and 
plotted against the value of the observation (or any desired transformation of the 
observation). Negative values are omitted for samples larger than 10. By analogy 
with the smaller samples the rankit for the seventh observation in a sample of 1 1 
is clearly —0-225 and that for the seventh in a sample of 12 is —0-103. The 
table is Bliss's (1967) adaptation of that of Harter (1061, Biomtirxka 48, 151-66). 
Reproduced with permission. 



Rank 










8i»e of u 


raple - A 










order 




2 


3 


4 


& 


6 


7 


• 


9 


10 


1 




0-504 


0-804 


1 029 


1-103 


1 207 


1 352 


1-424 


1-485 


1-530 


2 




-0-504 


0 000 


0 297 


0-495 


0 042 


0-757 


0-852 


0 932 


1-001 


3 






-0 804 


— 0-297 


O-OOO 


0202 


0 353 


0-473 


0-572 


0 650 


4 








-1 029 


-0 495 


-0 202 


0 000 


0153 


0 275 


0 370 


i 










1 .1 AQ 
■ 1103 


-0 042 - 


0 353 


— U 134 




ft. 1 99 


6 












-1 207 - 


0 757 


-0-473 


-0-275 


-0123 


7 












-1 352 


-0 852 


-0-572 


-OS70 


% 
















-1-424 


-0 032 


-0 060 


0 


















-1-485 


-1-001 


10 




















-1 530 




11 


12 


13 


14 


15 


10 


17 


18 


10 


20 


1 


1-586 


1-029 


1-008 


1-703 


1 730 


1-700 


1 794 


1-820 


1-844 


1-807 


2 


1002 


1 110 


1 104 


1 208 


1-248 


1 285 


1 319 


1 350 


1-380 


1 408 


3 


0-729 


0 703 


0 850 


0 901 


0 948 


0-000 


1 029 


1 000 


1 090 


1 131 


4 


0 462 


0 537 


0 003 


0002 


0-715 


0-703 


0 807 


0 848 


0 886 


0 021 


5 


0 225 


0 312 


0 388 


0 456 


0 510 


0-570 


0 019 


0 605 


0 707 


0 745 


8 


0 000 


0 103 


0 191 


0-207 


0 335 


0 390 


0451 


0 502 


0 548 


0590 


7 






0 000 


0 088 


0 165 


0 234 


0 295 


0 351 


0 402 


0 448 


8 










0 000 


0077 


0 140 


0-208 


0204 


0-315 


9 














0000 


0009 


0 131 




10 


















0000 


0 002 




21 


22 


23 


24 


25 


20 


27 


28 


20 


30 


1 


1-889 


1 010 


1 029 


1 948 


1 905 


1 082 


1-998 


2014 


2 029 


2043 


2 


1 434 


1-458 


1-481 


1 503 


1-524 


1-544 


1 503 


1-581 


1 500 


1-010 


3 


1100 


1-188 


1-214 


1 -239 


1-203 


1285 


1 300 


1 327 


1340 


1 S05 


4 


0054 


0-985 


1 014 


1-041 


1 007 


1 091 


1 115 


1137 


1158 


1 170 


5 


0-782 


0815 


0-847 


0-877 


0-005 


0 032 


0057 


0-981 


1004 


1 020 


0 


0030 


0007 


0-701 


0-734 


0704 


0 793 


0820 


0-840 


0871 


0894 


7 


0491 


0 532 


0 509 


0004 


0 037 


0 668 


0 697 


0 725 


0 752 


0 777 


S 


0 302 


0 400 


0 440 


0-484 


0-510 


0 553 


0 584 


0014 


0042 


0 000 


0 


0 238 


0 280 


0 330 


0 370 


0 409 


0-444 


0-478 


0 510 


0 540 


0 508 


10 


0 118 


0 170 


0218 


0 202 


0 303 


0 341 


0 377 


0 411 


0 443 


0-473 


11 


0000 


0050 


0108 


0 150 


0200 


0241 


0 280 


0310 


0-350 


0 382 


12 






0-000 


0052 


0 100 


0 144 


0185 


0224 


0 200 


0 204 


13 










0 000 


0048 


0002 


0 134 


01 72 


0 209 


ii 














0 000 


0044 


0 080 


0125 




















0 000 


0 041 



Copyrighted material 



TABLE A9 (Continued) 



Rank 










SIms of at 


imple - ..V 










order 


31 


32 


33 


34 


35 


36 


37 


38 


39 


40 


1 


2056 


2 070 


2 0*2 


2095 


2107 


2118 


2129 


2 140 


2-151 


2 161 


2 


1 632 


1 647 


1 602 


1 876 


1 690 


I 704 


1 717 


1 729 


1 741 


1 753 


3 


1 383 


1-400 


1-416 


1-432 


1-448 


1-482 


I 477 


1-491 


1-504 


1-517 


4 


1108 


1-217 


1-235 


1-252 


1-269 


1 -285 


1 300 


1-315 


1-330 


1-344 


5 


1 047 


1-067 


1-087 


1-105 


1123 


1140 


1 157 


1173 


1-188 


1-203 


6 


0 017 


0-038 


0 059 


0 079 


0 008 


1016 


1 03 4 


1-051 


1 -087 


1-083 


7 


0*01 


0M24 


0 846 


0867 


0 887 


0906 


0925 


0 943 


0 960 


0-977 


8 


0 604 


0-719 


0-742 


0 764 


0-788 


0 806 


0 826 


0-845 


0 863 


0-881 


y 


0 505 


0 621 


0 846 


0 870 


0 692 


0-714 


0 735 


0 755 


0 774 


0-793 


10 


0-502 


0-529 


0-558 


0-580 


0-604 


0-627 


0 649 


0-670 


0-690 


0-710 


u 


0-413 


0-442 


0-469 


0-496 


0-521 


0 545 


0-568 


0-590 


0-611 


0 832 


12 


0 327 


0358 


0 387 


0-414 


0 441 


0 406 


0 490 


0 514 


0 538 


0 557 


13 


0 243 


0 276 


0 307 


0 336 


0 364 


0-390 


0-416 


0 440 


0 463 


0-486 


14 


0161 


0 196 


0-228 


0 259 


0-289 


0 317 


0 343 


0 369 


0 393 


0-417 


15 


0-080 


0 117 


0 151 


0 184 


0 215 


0245 


0-273 


0 300 


0 325 


0 350 


16 


0 000 


0 039 


0070 


0 110 


0-143 


01 74 


0 203 


0-232 


0-258 


0-284 


17 






0000 


0037 


0071 


0 104 


0 135 


0 165 


0 193 


0220 


18 










0 000 


0035 


0067 


0 099 


0.128 


0156 


10 














o-ooo 


0 033 


0 064 


0 094 


20 


















0-000 


0 031 




41 


42 


43 


44 


45 


40 


47 


48 


49 


50 


) 


2171 


2180 


2190 


2-100 


2-208 


2-218 


2 225 


2-233 


2-241 


2-249 


2 


1-765 


1-776 


1-7M7 


1-797 


1-807 


1-817 


1 827 


1-837 


1 846 


1-855 


3 


1-530 


1 542 


1 554 


1-565 


1-577 


1-588 


1-598 


1 609 


1 619 


I 029 


4 


1 357 


1 370 


1 383 


1 -396 


1-408 


1-420 


1-431 


1-442 


1-453 


1-464 


5 


1 218 


1 232 


1 246 


1-259 


1-272 


1-284 


1-298 


1-308 


1-320 


1-331 


6 


1 000 


1114 


1 128 


1 142 


1 156 


1 169 


I 182 


1 194 


1 207 


1 218 


7 


0 093 


1 009 


1 024 


1 039 


1 054 


I 088 


1-081 


1 094 


1-107 


1119 


8 


0 898 


0 915 


0-931 


0 946 


0-081 


0 976 


0090 


1 004 


1-017 


1 030 


0 


0-811 


0-828 


0 845 


0861 


0877 


0 892 


0-907 


0921 


0 935 


0 949 


10 


0 720 


0 747 


0 764 


0 781 


0 798 


0814 


0829 


0844 


0 859 


0-873 


U 


0 651 


0 671 


0 889 


0 707 


0 724 


0 740 


0 757 


0 772 


0-787 


0-802 


12 


0 578 


0-598 


0 617 


0 636 


0 654 


0 671 


0 68H 


0-704 


0-720 


0-735 


13 


0 507 


0 528 


0 54* 


0 588 


0-588 


0-604 


0 622 


0 039 


0 655 


0-671 


14 


0 430 


0-461 


0-482 


0 502 


0 522 


0 540 


0 559 


0 576 


0 593 


0 610 


15 


0373 


0-396 


0418 


0-439 


0 459 


0-479 


0498 


0516 


0534 


0551 


16 


0300 


0 333 


0 355 


0 377 


0 398 


0-410 


0 438 


0-457 


0-476 


0-494 


17 


0246 


0-270 


0-204 


0317 


0 339 


0360 


0-381 


0400 


0419 


0-438 


18 


0 183 


0 209 


0 234 


0 258 


0 281 


0303 


0 324 


0 345 


0 364 


0 384 


19 


0 122 


0 149 


01 75 


0 200 


0 224 


0247 


0-269 


0 290 


0310 


0 330 


20 


0061 


0-089 


0 no 


0 142 


0 187 


0 191 


0 214 


0 236 


0257 


0-278 


21 


0 000 


0 030 


0058 


0085 


0 111 


0 136 


0160 


0 183 


0205 


0-227 


22 






0 000 


0 028 


0 055 


0 081 


0 106 


0 130 


0 153 


0 178 


23 










0 000 


0027 


0053 


0078 


0 102 


0 125 


24 














0000 


0026 


0051 


0075 


25 


















o-ooo 


0025 



Copyrighted material 



Copyrighted material 



References 



Axon. (1932). Illustrated London News 180, 1057. 

Bailey, N. T. J. (1964). The elements of stochastic processes. Wiley, New York. 

(1967). The mathematical approach to biology and medicine. Wiley, London. 

Bain, W. A. and Batty, J. E. (1966). Inactivation of adrenaline and noradrena- 
line by human and other mammalian liver in vitro. Br. J. Pharmacol. 11, 
53-7. 

Bartlett, M. 8. (1947). The use of transformations. Biometrics 3, 39-52. 

Bayks, T. (1763). An esaay towards solving a problem in the doctrine of chances. 
Phil. Trans. Soc. 58. 370. 

Bernard, C. (1965). An introduction to the study of experimental medicine. Collier 
Books edition (1961). Collier Books, New York. 

Blibs, C. I. (1947). 2x2 Factorial experiments in incomplete groups for use in 
biological assays. Biometrics 8, 69-48. 

Buss, C. I. (1967). Statistics in biology. Vol. I. McGraw-Hill. 

Boyd, I. A. and Martin, A. R. (1956). The end-plate potential in mammalian 
muscle. J. Physiol., Land. 188, 74-91. 

Box, O. E. P. and Cox, D. R. (1964). An analysis of transformations. Jl R. 
statist. Soc. BS6, 211-13. 

Brownlbs, K. A. (1966). Statistical theory and methodology in science and 
engineering, 2nd edn. Wiley, New York. 

Burn, J. H., Finney, D. J., and Goodwin, L. G. (1950). Biological standardisa- 
tion, 2nd edn. Oxford University Press. 

Burn stock, G. and Holman, M. E. (1962). Spontaneous potentials at sympath- 
etic nerve endings in smooth muscle. J. Physiol., Land. 160, 446-60. 

Cochran, W. G. (1952). The test of goodness of fit. Ann. math. Statist. 88, 
315-46. 

and Cox, G. M. (1967). Experimental designs, 2nd end. Wiley, New York; 

Chapman and Hall, London. 
Colquhoun, D. (1963). Balanced incomplete block designs in biological assay 

illustrated by the assay of gastrin using a Youden square. Br. J. Pharmac. 

Chemother. 21, 67-77. 
(1968). The rate of equilibration in a competitive n drug system and the 

auto-inhibitory equations of enzyme kinetics: some properties of simple 

models for passive sensitization. Proc. R. Soc. B170, 135-54. 
(1969). A comparison of estimators for a two-parameter hyperbola. J I R. 

statist. Soc. Ser. C (Applied Statistics) 18, 130-40. 
and Tatters all, M. (1970). Rapid histamine assays: a method and some 

theoretical considerations. Br. J. Pharmac. Chemother. 88, 
Cox, D. R. (1962). Renewal theory. Methuens, London, Science Paperback (1967). 
and Lewis, P. A. W. (1966). The statistical analysis of a series of events. 

Methuen, London. 

Cushny, A. R. and Peebles, A. R. (1905). The action of optical isomers. II. 
Hyoscines. /, Physiol, Land. 88, 601-10. 



Copyrighted material 



416 References 



DEWS, P. B. and Berkson, J. (1954). Statistics and mathematics in Biology, 
(Eds. O. Kempthorne, Th.A. Bancroft, J. W. Gowen, and J. L. Lush), pp. 
361-70. Iowa State College Press. 

Documents Geigy scientific tables, 6th edn (1962). J. R. Geigy, S. A. Basle, Switzer- 
land. 

Dowd, J. E. and RI009, D. S. (1965). A comparison of estimates of Michaelis- 
Menten kinetic constants from various linear transformations. J. biol. Chem. 
140, 863-9. 

Draper, N. R. and Smith, H. (1966). Applied regression analysis. Wiley, New 
York. 

DuNNETT, C. W. (1964). New tables for multiple comparisons with a control. 

Biometrics 20, 482-91. 
Durbin", J. (1951). Incomplete blocks in ranking experiments. Br. J. statist. 

Psychol. 4, 85-90. 

Feller, W. (1957). An introduction to probability theory and its applications. 

Vol. 1, 2nd edn. Wiley, New York. 
(1966). An introduction to probability theory and Us applications, Vol. 2, 

2nd edn. Wiley, New York. 
Finney, D. J. (1964). Statistical method in biological assay, 2nd edn. Griffin, 

London. 

Latocha, R., Bennett, B. M., and Hsu, P. (1963). Tables for testing 

significance in a 2x2 table. Cambridge University Press. 
Fisher, R. A. (1951). The design of experiments, 8th edn. Oliver and Boyd, 

Edinburgh. 

and Yates, F. (1063). Statistical tables for biological, agricultural and medical 

research, 6th edn. Oliver and Boyd, Edinburgh. 
Goulden, C. H. (1952). Methods of statistical analysis, 2nd edn. Wiley, New York. 
Guilford, J. P. (1954). Psychometric methods, 2nd edn. McGraw-Hill, New York. 
Hemelrijk, J. (1961). Experimental comparison of Student's and Wilcoxon's 

two sample tests. In Quaniitive methods in pharmacology (Ed. H. de Jonge). 

North Holland, Amsterdam. 
Hooke, R. and Jeeves, T. A. (1961). 'Direct search* solution of numerical and 

statistical problems. J. Ass. compui. Mach. 8, 212-29. 
Katz, B. (1966). Nerve, muscle and synapse. McGraw-Hill, New York. 
Kempthorne, O. (1952). The design and analysis of experiments. Wiley, New 

York. 

Kendall, M. G. and Stuart, A. (1961). The advanced theory of statistics, Vol. 2. 
Griffin, London. 

(1983). The advanced theory of statistics, Vol. 1,2nd ed. Griffin, London. 

(1986). The advanced theory of statistics, Vol. 3. Griffin, London. 

LiNDLEY, D. V. (1965). Introduction to probability and statistics from a Bayesian 
viewpoint, Part 1. Cambridge University Press. 

(1969). In his review of "The structure of inference" by D. A. S. Fraser. 

Biometrika 56, 453-6. 

Mainland, D. (1963). Elementary medical statistics, 2nd edn. Saudere, Philadel- 
phia. 

(1967a). Statistical ward rounds — 1. Clin. Pharmac. Ther. 8, 139-46. 

(19676). Statistical ward rounds— 2. Clin. Pharmac. Ther. 8, 346-55. 

Marlowe, C. (1604). The iragicaU history of Doctor Faust us. London: Printed by 

V. S. for Thomas Bushell. 
MARTIN, A. R. (1968). Quantal nature of synapt ic transmission. Physiol. Rev. 46, 

61-66. 



Copyrighted material 



References 417 



MAB8EY, H. S. W. and KestelmaN, H. (1964). Ancillary mathematics, 2nd 
edn. Pitman, London. 

Mather, E. (1951). Statistical analysis in biology, 4th edn. Methuen, London. 

McDonald, B. J. and Thompson, W. A. Jr. (1967). Rank sum multiple com- 
parisons in one- and two-way classifications. Biometrika, 54, 487-97. 

Mood, A. M. and Graybill, F. A. (1963). Introduction to the theory of statistics, 
2nd edn. McGraw-Hill Kogakusha, New York. 

N a in , K. R. (1940). Table of confidence intervals for the median in samples from 
any continuous population. Sankhya 4, 551-8. 

Oakley, C. L. (1943). He-goats into young men: first steps in statistics. Univ. 
Coll. Hosp. Mag. 28, 16-21. 

Oliver, F. R. (1970). Some asymptotic properties of Colquhoun's estimators 
of a rectangular hyperbola. J. R. statist. Soc. (Series C, Applied statistics) 19, 
269-73. 

Pearson, E. S. and Hartley, H. O. (1966). Biometrika tables for statisticians, 

Vol. 1, 3rd edn. Cambridge University Press. 
Poincare, H. (1892). Therrnodynamique. Gauthier-Villars, Paris. 
Rano, H. P. and COLQUHOtTN, D. (1973). Drug Receptors: Theory and Experiment. 

In preparation. 

Schor, S. and Karten, I. (1966). Statistical evaluation of medical journal 
manuscripts. J. Am. med. Ass. 195, 1123-8. 

Searle, S. R. (1966). Matrix algebra for the biological sciences. Wiley, New York. 

SffiOEL, S. (1956a). Nonparatnetric statistics for the behavioural sciences. McGraw- 
Hill, New York. 

(19566). A method for obtaining an ordered metric scale. Psychomelrika 11, 

207-16. 

Snedecor, G. W. and Cochran, W. G. (1967). Statistical methods, 6th edn. 

Iowa State University Press, Iowa. 
Stone, M. (1969). The role of significance testing: some data with a message. 

Biometrika 56, 485-93. 
Student (1908). The probable error of a mean. Biometrika 6, 1-25. 
Taylor, D. (1957). The measurement of radioisotopes, 2nd edn. Methuen, London. 
Thompson, Silvanus, P. (1965). Calculus made easy. Macmillan, London. 
TiPPETT, L. H. C. (1944). The methods of statistics, 4th edn. Williams and Norgate, 

London ; Wiley, New York. 
Trevan, J. W. (1927). The error of determination of toxicity. Proc. R. Soc. B101, 

483-514. 

Tukey, J. W. (1954). Causation, regression and path analysis. In Statistics and 

mathematics in biology (Eds. O. Kempthorne, Th. A. Bancroft, J. W. Gowen, 

and J. L. Lush), p. 35. Iowa State College Press, Iowa. 
Wilcoxon, F. and Wilcox, Roberta, A. (1964). Some rapid approximate 

statistical procedures. Published and distributed by Lederle Laboratories, Pearl 

River, New York. 

WILDE, D. J. (1964). Optimum seeking methods. Prentice-Hall, Englewood Cliflfe, 
N. J. 

Williams, E. J. (1959). Regression analysis. Wiley, New York, Chapman and 
Hall, London. 



Copyrighted material 



Copyrighted material 



Index 



acetylcholine release, ess quantal release 
«*mwg up, see summation operation 
addition role, lfij *M aUo probability 
additivity assumption, 173 
adrenaline oatabolism 
fitting exponential, 234—48 
stochastic interpretation, 37_ft 
adsorption, stochastic interpretation, 

all-or-nothing response*, 344-64 

linearixation, «m probit 

relation with LED, 348 
analysis of variance, 171-213 

aasaye, Me assays 

assumptions in, 112 

control group in, 208 

curve fitting, 214-B7 

expectation of mean squares, 178, 186 

Friedman rank, 200. 209, 40ft (table) 

Gaussian 

independent samples, 182, 191, 210, 
234, 321 

randomized blocks, 195, 210, 286, 311. 
31ft 

homogeneity of group varianoee, L 7 6 

Kruakal-Wallia, 191^ 208, 406, (table) 

mean squares, 187, 197. 229. 238 

models for observations, 172-8. 186. lftfi 

multiple signifloanoe tests, 207-13 

multiple regression approach, 266 

n on parametric 

independent samples, 191, 208, 400 
randomised blocks, 200, 209, 40ft 

one way, 182. 210 

ranks, 171, 19_L 200, 208-10 

relation 

with ohi-squared, 180 

with t teste, 179, 190, 196, 226, 232=4 

sum of squares, 27, 184-90. 217-20. 
244-63 
additivity of, 189, 223 
working formulae, 30, 188, 224 

testing all pairs, 207-13, 410-11 (table) 

two way, 182, 210 

variance, maximum /minimum, 1 76 
arbitrary moment in time, 84, 883 
area under distribution curve, 64-9 
assays, 279-364 

analytical dilution, 280 

comparative, 280 



says (coal.) 

continuous (graded) responses. 280 
designs for, 285 
direct, 844-6 

discontinuous (quanta!) responses, 280, 

844=64 
incomplete block, 206, 286 
interaction between responses, 286, 319 
Latin square, 286 

metameters for dose and response, 280, 

287. 286. 321 
random, 285, 327 
randomised blocks, 286, 311. 31ft 
rapid routine, 340 
single subject, 286 
slope ratio, 281 
parallel line 

average slope, 292 

confidence limits for, 297, 308 

confidence limits, examples, 333, 317, 
326,331 

convenient base for logs, 287 

designs for, 285 

four point, see (2 + 2) dose 

interpretation, 314. 823 

(k + 1] dose, 340 

logits in, 361 

matching, 284 

numerical examples, 31 1-43 

optimum design, 299 

orthogonal contrasts, 303-7 

parallelism test, 300 

plotting results, 318, 330 

potency ratio, 282, 290, 208 

potency ratio, examples, 318, 324. 
331. 341 

six point, see (3 + 3) dose 

slope (linear regression test), 300, 303, 
306 

symmetrical, 285, 287, 289, 302, 308 

symmetrical, examples, 311, 319 

(3 + 3J dose, 284, 289, 305, 310 

(3 + 3) dose, example, 31ft 

(3 + 2J dose, 327 

(2+1) dose, 284, 340 

(2 + 2] dose, 283, 287, 302, 308 

(2 + 2) dose, example, 311 

unite for doses, 316 

unsymmetrioal, 285, 290, 300 

unsymmetrical, example, 321 



Copyrighted material 



420 



Index 



assays — (cont.) 

validity teste, 300-7 
assumptions, 70-2, 86, 101-3. 111. 139, 
144, 148, 153, 158. 167. 172. 205-6. 
207 ; ate also individual methods 
in fitting curve*, 220. 234 
in multiple regression, 254 
asymptotic relative efficiency of 

significance teste, 92 
averages, ate means 



bacterial dilutions, 56 

balanced incomplete blocks, ate incomplete 
Bayos' method, 6-8, 21, 95; ate alto 

probability and significance teste 
example in medical diagnosis, 21-4 
best estimate, 101, 216, 257-72; ate also 

bias, least squares, and likelihood 

bias 

of estimates in curve fitting, 216, 

266-72. 369-70 
in sampling, 3, 16, 85* 102, 389-95 
statistical, 3, 29, 216, 266-72. 369-70 

binomial distribution, see distribution 

bio -assays, see assays 

calibration curves, 280, 332 

relation with assays, 340 
card-shuffling analysis, 118^ ISJt 192; set 

also randomization teats 
catabolism, exponential, ate adrenaline 
cell distribution, random, 55 
cbj-squared 

rank, 202 

tables, 129, 132, 134 
test 

continuity correction in, 129 
for goodness of fit, 132 
for more than two samples, L31 
relation with other methods, 1 16 
for two samples, 116. 122=32 
written as normal deviate test, 124 
classification measurements, 99, 1 16 36 

two independent samples, lift 

two related samples, 134 
coefficient of variation 

method for quantal content, 58-60 

population, 3D 

sample, 30 

use, 40, 220 
combinations and permutations, 50, 140, 

158. 167. 192-3. 201 
confidence limits, 101-15 

for binomial J, KHL 398 (table) 

for fitted straight line, 224* 23J 

for half-life, 242 



confidence — (cont. ) 

interpretation, 101-3, 108, 114. 333 
for median, nonparametric, 103. 396 
(table) 

for new observations, 107 

on straight line, 222 
for normally distributed variable, LM 
for potency ratio, 297, 308, 344-5 

examples, 317, 325, 331, 343 
for punty in-heart index, 111 
for rate constant, 242 
for ratio 

approximate, 4Jj 107 

exact, 293, 332-40, 345 
for slope of straight line, 224 
for time constant, 242 
trustworthiness, 101-3 
for variance, L2& 

for x value read off straight line, 224, 

293, 332=40 
for Y value read off straight line, 224-7, 
231, 332-4 0 
continuous distribution 
meaning, 43-4, 64—9 
population mean (expectation) of, 365-8 
ate also distribution 
control group in analysis of variance, 208 
correlation, 5, 31, 169, 272-8 
coefficient 

Pearson, 109. 213 
Spearman (rank), 274, 277 (table) 
interpretation of, 5, 254-6, 222 
covariance 

population, 31 
sample, 31 

working formula, 32 
cross-over trials, 134 

cumulative distribution, ate distribution 

function 
curve fitting, 214-72 

assumptions in, 220; ate also analysis of 
variance 

best estimate, meaning of, 101, 216, 
266-72 

confidence limits, ate separait entry 
definition of sum of squares, 217 
errors in, 222 
exponential curve, 234-43 
hyperbola, 257-72 

least squares method, see separate entry 
linear problems, meaning of, 252 
Michaelis-Menten hyperbola, 257-72 
multiple linear, 252-6 
nonlinear, 257-72, 262, 336 
polynomial curves, 252, 253, 336 
role of statistics, 216 
straight line, 216-57 
transformations in, 221, 238, 243 



Copyr 



Index 



421 



data selection ('data snooping'), 168, 202 
deduction, ]_, 6 

degrees of freedom, meaning, 29^ 389 

density, see probability density 

dependent variable, 214, 210 

discontinuous distribution, see distribution 

distribution 

binomial, 43-62. 64, 59, 104. 109-14. 

124. 164. 359. 365, 398 (table) 
continuous, meaning, 84— ft 
cumulative, see distribution function 
discontinuous, meaning of, 43-4, 84-9, 
350 

exponential, 81-5, 367-8, 380, 383, 
388-95 

stochastic interpretation, 81-5, 
379 -95 
function, 67-9. 387, 389 

examples of, 68, 82^ 346-68. 380. 383. 
389 

length -biased, 389=95. 
Gaussian (normal), 69-75, 96-9, 101, 

345- 64. 368 

approximation to binomial, 52-3. 

116,124 
tests for fit, 80 

transformations to, 71, 78, 80, 221=2 
goodne88-of-fit tests 

chi-squared, 132 

probit and rankit, 80 
length-biased, 85, 389-95 
lognormat, 78-80. 107. 176. 221. 239, 

346- 4U 
meaning of, 43^t, 64-9 
multinomial, 44 

Poisson, 52-63. 81-5. 376-8. 388-95; 

see also quantal release 
skew and symmetrical, 78-80 
standard form of, 3(39 
standard Gaussian, 72-5. 126. 
Student's f, 75-8. 148, 167 
duae metameter, 280, 281 
ratio. 283 

drug-receptor interaction, see adsorption 
Dunnett's d statistic, 208 

ED50, see median effective dose 
efficiency of significance tests, 9JZ 
epinephrine, see adrenaline 
error 

distribution of, see distribution 
estimates of, 1-8. 28-42; see also 

variance 
of the first kind, £3. 
homogeneity of, see homoscedastic 
limits of, see confidence limits 
of the second kind, Q3_ 



error — (cant.) 

trustworthiness of estimates of, 1-8, 
ini-a 

estimation, see bias, least squares, likeli- 
hood, and beet estimate 
exp(r), 68 
expectation, 305-8 

of any function, 368 

of function of two variables, 370-3 

of mean squares in analysis of variance, 
178. lflfi 

see also mean 
experimental method, meaning, 3=8 
exponential 

curve fitting, 234-43 

distribution, see distribution 



F ratio, see variance ratio 

factorial function, 9, QQ 

fiducial limits, see confidence limits 

Fieller's theorem, 293 

Fisher exact test for 2 x 2 table, UJL 117 

use of tables for, 122 
four-point assay, see assays, parallel line, 

(2 + 2) dose 
Friedman method, 200, 409 (table), ill 

(table) 
function 

expectation of, 365-6, 370-3; see also 

mean 
factorial, 10. 

mathematical, meaning of, 9 
variance of, see variance 

Gaussian (normal) distribution, see distri- 
bution 

generalization, 1-8, 9L 102 
Gosset, W. 8., see 'Student' 



half-life, 230 

confidence limits for, 242 
stochastic interpretation, 380, 385 

heteroscedastic, see homoscedastic 

Hill plot, 303 

histogram, 44, 53j 64-8, 346-53 

area convention, 66^ 350 
homoscedastic, 167, 175, 221^ 266, 269. 

272. 281, 359 
hyperbola, fitting of, 257-72. 361-4 
hypothesis, 6, 87-96 

IED, see individual effective dose 
incomplete block designs, 206-7 
for assays, 280 



Copyrighted material 



422 



Index 



independence, statistical, 20, 21, 22, 31. 44, 
64, 84, 99, 276-7. 286, 375, 379. 3iU 

of contrasts, 302-7 
independent 

Bam pies 

classification measurements, 91, 1X6 
numeric*! measurements, 91, 187-61, 
182 

rank measurement*, 91, 187-48, 141 
«m aUo significance testa, random, and 
BampLe 

variable in carve fitting, 214 
individual effective dose, 112. 344-64 
relation with all-or-nothing response, 
848 
induction, 6 
inference, scientific, 6-8 

precision of, 101-3. 114; Me aUo 
variance, confidence Limit*, and 
bias 

intervals between random events, 81-6, 

874—96; «m also lifetime 
isotope, «M radioisotope 

Krnskal-Wallis method, 191, 4QQ (table) 
for testing all pairs, 41Q (table) 

Langmuir equation 
fitting of, 267-72. 861 
stochastic interpretation, 380-6 

Latin square, 204 

LD60, «m median effective dose 

east squares method 
for assays, 219 

and 'best' estimates, 216, 257-72 

for curve fitting, 216-20 

geometrical interpretation, 243-63, 

259-62 
for means, 21 

for Miohaelis-Menten hyperbola, 267-72 
without calculus , 27, 220 
lifetime, 81-6 

of adrenaline molecule, 380 
of adsorbed molecule, 382 
of empty adsorption site, 384 
independence of when timing started, 
388 

of isotope, 886-7 

length biased sample, 84, 889-96 

meaning of, 81-6, 386 

residual, 84, 883, 388-96 

twice average length, 84, 389-95 
likelihood 

maximum, 8, 266-72 

technical meaning, 7, 21-4 
Limits of error, see confidence limits 



Iineweaver-Burk plot, 266-72 
logarithm 

changing base of, 291 

negative, 826 

transformation, «m transformation 
logistic curve, 361-4 
Logit transformation, 861-4 
lognormal distribution, #se distribution 

Mann -Whitney test, 148 
mean 

of any function, 866, 868 

arithmetic, population (expectation), 

866-8 
arithmetic, sample 

Least squares estimate, 22 

standard deviation of, 88-8, 89 

variance of, 88-8. 89 

weighted, 24, 89 
of binomial distribution, 60, 866 
deviation, 28 

of exponential distribution, 81-6, 367 

of function of two variables, 870—8 

of Gaussian (normal) distribution, 69, 

866 
geometric, 26 

lifetime and residual Lifetime, SSS life- 
time 

of lognormal distribution, 78, 346-67 
of Poisson distribution, 64. 81, 366, 826 
relation with median and mode, 78-80, 

101. 848. 868 
squares, 187, 197, 229, 238; sss also 
analysis of variance 
median 

effective dose (ED60), 846-64 
lifetime, see lifetime and stochastic 
prooesaee 

population, 26 

relation with mean and mode, 78-80, 

101, 346-64. 868 
sample, 26, 101, 108 
metameter, dose and response, 280, 287; 

ate also transformations 
MichaeLis-Menten equation, fitting of, 

267-72. 861 
minimisation, see optimisation 
m inim""* 

effective dose, definition of, 860 
lethal dose, 360 
mode, 22 

relation to mean and median, 78-80, 
846-64, 868 
models for observations 

fixed and random, 171, 17JL 186, 204-6 

mixed, 196 
multinomial distribution, 44 



Copyrighted material 



Index 



423 



multiple 

comparisons, 2Q7 

linear regression, 252-6 

•ad analysis of varianoe, 2M 
multiplication 

operator, 10, 25=6 

role, 20, S76 



neuromuscular junction, see quanta I 
releaae 

non-linear regression, see curve fitting 
nonparametric methods, characteristics, 

96, 06-0 
normal 

distribution, see distributions, Gaussian 
equivalent deviation, see probit 
null hypothesis, 6, 87-96 



observational method, meaning, 5j see 

also oorrelation 
Occam's razor, 215 

operation, meaning, 9; at* also summation, 
eto. 

optimism, of estimates of error, IQ1-3 
optimisation, 262-7 
orthogonal contrasts, 302 7 

numerical examples, 311-43 

variance of, 306 



P value, from significance test, meaning, 

66-100, 201 
parallelism, test for, see assays, parallel 

line 
parameter, 4 

patternaearoh minimization, 263 
permutations, see combinations 

random, vii, 16-19 
permutation testa, see randomization tests 
Poisson distribution, see distribution 
polynomial curve fitting, 262-4, 388 
population, 4, 15, 20, 43, 64-9; see also 

standard deviation and mean 
power, of significance tests, 93-100 
prior probabilities, see probability 
probability 

addition rule, 18 

Bayee' theorem, 21-4, 95 

binomial, 46, 109-14 

confidence, see confidence limits 

density, 64-9 

direct, 6-6 

distribution, see distribution 
inverse, 6-8, 87 
meaning of, 15-16, 9_5 



probability— (cont.) 

multiplication rule, 20, 276 

posterior, 6-8, 21-4. 96 

prior, 6-8, 21-4, 95 

significance value, 86-95, 207 

subjective, 16, 95 
probit transformation, 347, 356 

and haemolysis, 361 

linearizing sigmoid curves, 361 

teat for Gaussian distribution, SQ 
purity in heart, aaaay for, 111 



quadratic equation 

fitting, see curve fitting, polynomial 
solution of, 294 
quanta! 

releaae of acetylcholine 

number of quanta per impulse, 57-60 
intervals between quanta, 81=5 
responses, see all-or-nothing responses 
and probit transformation 
quantitative numerical measurements, 99 



radiation, 'safe dose' of, 360 
radioisotope disintegration 

errors in, 52, 60-3 

stochastic interpretation, 385-7 
random 

blocks, 171, 195, 200, 207 

Latin square, 206 

permutations, vii, 16-19 

process, 52-63, 81-5, 374-95; see alto 
lifetime and stochastic 

sample 

reasons for necessity, 119 
rejection of unacceptable, 123 
selection of, 3, 16-19, 43-5 

sampling numbers, use of, vii, 16-19 
randomization testa, 96, 9ft 

classification measurements, 117 

Cuahny and Peebles' data, 143 

numerical and rank measurements, 138, 
143. 163. 167. 160. 191. 2HQ 

rationale, 96, 111 

unacceptable randomizations, 123 

see also card shuffling analysis 
randomized blocks, 171, 195, 200, 207 
range, 28 

rank measurements, 96, 99, 116, 137, 162, 
171. 191, 200. 207-10 
oorrelation, 214 
rankita, aa test for Gaussian distribution, 

80, ill (table) 
rate constant, 238 

stochastic interpretation, 380, 385 



Copyrighted material 



424 



Index 



ratio 

dose, 283 

of maximum to minimum variance of 
set, 12ft 

potency, see assays and confidence limits 

of two Gaussian variables, ate confidence 
limits and variance of funotions 

of two estimates of same variance, tee 
variance ratio 
receptor-drug interaction, tee adsorption 
regression 

analysis, see curve fitting 

equation, 211 

linear, 216-257 

non-linear, 243-272 
related samples 

advantages of, 169 

classification measurements, 9_L 134 

rank measurements, 91, 162-66, 200 

numerical measurements, 91, 152-70, 
195, 200 

tee alto randomized blocks 
root- mean -square deviation, see standard 
deviation 



sample 

length-biased, 85, 389-95 ; tee also 

lifetime and stochastic processes 
simple, 44 

small, 49, 75. 80, 89, 96-9 

strictly random, vii, 3-6, 16-19. 43-5, 
117. 207; tee alto random 
Beheffes method, 210 
scientific method, 3-8 
sign test, 153 

significance tests, see guide to particular 
tests on end sheet 
for all possible pairs, MM. 200, 207-10 
assumptions in, 70-2. 86, 101-3. HI. 
139, 144, 148, 153, 158, 167, 172. 
205-6, 207 
critique of, 93-5 
efficiency of, 91 

interpretation of, 1-8, 70-2, 86-100 
maximum variance/minimum variance, 

113. 

multiple, 19U 200, 207-10 
one-tail, 86 

parametric versus nonparametrio, 96-9 
randomization, tee randomization tests 
ranks. 96, 99, lift .137. 162. Ill 
ratio of maximum to minimum variance, 

lift 
relation 

with confidence limits, 151, 155, 168, 
232 



significance — (con*. ) 

between t tests and analysis of 

variance, 190, 196, 233 
between various methods, 1 16, 137, 
162. 121 
relative efficiency, 97 
two-tail, 88 

for variance, population value of, 128. 
simulation, 298 

six point assay, tee assay, parallel line, 

(3 + 3J dose 
skew distributions, 78-80. 101. 348, 368 
standard 

deviation, see variance of functions 

of observation, tee variance of funotions 

error, 33, 35, 38, 38, 

form of distribution, 369 

Gaussian (normal) distribution see 
distribution 
statistics 

expected normal-order, 412 

role of, 1^3, 86, 93, 98, 101, 214, 374 

technical meaning, 4 
steepest -descent method, 282 
stochastic processes, 1, 81-5, 374-95 

adsorption, 380—5 

catftbulium, 379 

isotope disintegration, 52, 60-3, 385-7 
length bias, 85, 389=95 
lifetime, see lifetime 
meaning, 1, 8L 374 

of o(Al), 376, 378, 381 
Poiflson, derivation, 375-8 
quantal release of acetylcholine, 57-60. 
81^6 

residual lifetime, tee lifetime 
tee alto distribution, exponential and 
distribution, Poisson 
straight-line fitting, 214-57 
'Student* (W. S. Gosset), 7J 
paired t teat, lfll 
I distribution, 75-8 
tobies of, "7 
test, 148 

relation with analysis of variance, 179, 

190, 196, 228, 232-4 
relation with confidence limits, 151, 168 

sum 

of products, 31 

working formula, 32 
of squared deviations (SSD), 27, 184-90. 
197. 217-20. 244 57 
additivity of, 189, 223 
working formula for, 30, 188, 224 
tee alto least squares method and 
analysis of variance 
summation operator, L, 10-14 
survey method, meaning, ft 



Copyrighted material 



Index 



425 



tables, published, vii 
tail of distribution, 61, 12 
testa 

for add itivity, 124 

of assumptions, assumptions 

for equality of variances, 170 

for Gaussian (normal) distribution, 
probit and rankit, 89 

for goodness of fit, L12 

for Poisson distribution, L23 

of significance, tee significance 
threshold dose, 360* 364 
time constant, 233 

stochastic interpretation, 380, 385 
transformations 

for additivity, 176-8 

for analysis of variance, LI6_ 

in assays, 280-3. 287. 340. 

in curve fitting. 221-2, 238, 243 

to Gaussian distribution, 71, 78, 80. 
176-8. 221, 239, 287, 344=6 

linearizing, 221-2, 238, 266-72, 353 

logarithmic. 78, 176, 221-2, 238. 280-3, 
287, 29L 344-6, 361 4 

logit, 301 

normalizing tee transformation, to 
Gaussian 

probit, 80, 34L 363 64 

rankit, 80 

reciprocal, 266-72 
2x2 table 

independent samples, 1 16-134 

related samples, 134 
two samples, difference between, tee signifi- 
cance tests; and guide on end sheet 

unacceptable randomizations, L23 

validity of assays, ftl 
variability, measures of, 28 
variance of functions of observations 
of any function (approz.), 39-40 
of any linear function, 39, 225, 307 
of difference, 22 

of function of correlated variables, 27, 41 

of linear functions, 39, 225. 307 

of logarithm of variable (approx.)f 40. 

of mean, 33, 35, 36, 38, 101 

meaning, 33-42 

multipliers, definition, 295 

population, xviii, 28, 2ft 

of binomial distribution, 50, 359, 368 

constancy of, 16L 175-6. 221, 266. 
289^ 272^ 281_, 25ft 

definition of, 368-9 

estimation from probit plot, 353=64 

examples, 51, 60-3 

of exponential distribution, 368 



variance — (ami.) 

of lognormal distribution, 78-9. 
346-57 

of Foiason distribution, 55, 368; ass 
also distribution and quant&l 
release 

of potency ratio, tee confidence limits 
of product of two variables, 40. 
of ratio of two variables 
(approx.), 4T, 107, 296 
(exact), tee confidence limits 
of reciprocal of variable (approx.), 41, 212 
sample, xviii, 28, 29, 269 
bias of, 29, 307, 369 
ratio of maximum to minimum, lift 
ratio of two estimates, see variance 
ratio 

when population mean known, 29. 
307. 209 

working formula for, 30 
of slope of straight line, 225 
of sum 

or difference, 22 

of A r variables, 37, 202 

of variable number of variables, 41, 
58-60, 370-3 
of value 

of x read off straight line, tee con- 
fidence limits 

of Y read off straight line, 227. 
of variable 

•f constant, 22 

X constant, 38 
of variance, 128. 

of weighted arithmetic mean, 39, 292 
tee also confidence limits 
variance ratio (F) 
less than one, 182 
meaning of, 176. 12ft 
relation 

with chi-squared, 180 
with Student's I, 179, 190, 196, 226. 
232=4 
tables of, 1SJ 
virginity. 111 

waiting time, see lifetime and stoch- 
astic processes 
paradox, 84, 374-95 
weighting, 25, 220, 272, 292 
Wilcoxon 

signed ranks test for two related 
samples, 160, 405 (table) 

teat (Mann- Whitney) for two inde- 
pendent samples, 143, 402 (table) 

Yates' correction for continuity, 126, 129, 
122 



Copyrighted material 



This book is based on lec- 
ures given to medical science 
students with elementary 
mathematical knowledge and no 
statistical knowledge 

The aim has been to include 
those topics which are of 
interest to all laboratory workers, 
the discussion being based 
mainly on medical, physiologi- 
cal, and pharmacological prob- 
lems. Some subjects, such as 
random ('stochastic') processes, 
fitting curved lines other than 
polynomials, calibration curves 
and parallel line assays, which 
are usually omitted from 
elementary courses, have been 
discussed here because of their 
great practical importance. 

An attempt has been made 
to convey a critical understand- 
ing while keeping the mathe- 
matics elementary (except for 
the appendices). For example, 
by starting the discussion of 
significance tests with the non- 
parametric approach, every 
step in obtaining the result is 
shown using little or no algebra. 
There are worked numerical 
examples for the methods 
described. 

(Also available in hard covers) 



OXFORD UNIVERSITY PRESS 



£2-75 net 

in I IK 



